Overview of methods developed in the group
Timeline of methods developed in our group (click image for a larger view)

Our research focuses on the development of state-of the art statistical and machine learning methods and tools for the analysis of high-dimensional biomolecular data. Correspondingly, over the last few years we contributed to development of methods for the following areas:

For reproducibility we make available our methods in free software, mostly for the R platform.

See also the ISI ResearcherID and Google Scholar pages.

Please find below a representative selection of our publications. For a complete list of publications and for current preprints see the publications of Korbinian Strimmer and the web pages of the other group members.

High-dimensional data analysis:
big data toolbox
Overview of our toolbox for analyzing high-dimensional data (click image for a larger view)
We are interested in methodology for high-dimensional inference, including Bayesian learning, regularization and shrinkage methods. We study multiple testing approaches (such as false discovery rates and higher criticism), are interested in model and variable selection for regression and classification, and in computationally effective algorithms for large-scale data. Our methods are implemented in a toolbox for high-dimensional data analysis:

A common feature of our R packages is the avoidance of computationally expensive procedures and preference on scalable algorithms with low complexity. Specifically, we rely on analytic approximation where possible (e.g. for optimizing tuning parameters) and also use effective analytic model selection strategies (higher criticism, CAR and CAT scores etc.). This toolbox is used, e.g., in our software for gene network analysis (GeneNet), gene ranking (st) and mass spectrometry (MALDIquant).

See the example R scripts whiten.R and whiten-example.R.

Recently, we have also discovered the unique optimal whitening procedure for variable selection, showing the optimality of the previously introduced CAT and CAR scores.

Transcriptome and proteome analysis:
example mass spectrum
Example of a protein mass spectrum (click image for a larger view)
We are interested in statistical bioinformatics approaches to analyze gene expression and proteomics data. To this end we have developed a versatile platform for computational mass spectrometry. Another current interest is the analysis of RNA-Seq data.

Gene ranking and biomarker discovery:
CAR regression models
CAR regression models for diabetes data (click image for a larger view)
We are interested biomarker discovery and have recently proposed the CAT and CAR scores for ranking of correlated genes. In addition we introduced the shrinkage t statistic, a regularized t-score useful in high-dimensional data analysis with small samples:

Signal identification and FDR:
local FDR tresholds
Local FDR thresholds and natural class boundary (click image for a larger view)
We have developed statistical approaches for detection of signal in high-dimension genomic data and for multiple testing using false discovery rates (FDR):

Graphical models and biological networks:
entropy network
Entropy-based gene association network (click image for a larger view)
In our group we have developed a series of algorithms using graphical models for learning large-scale gene association networks from high-throughput data:

Molecular evolution:
rjMCMC estimate of population size
Reversible jump MCMC estimate of population size (click image for a larger view)
One of our first research interests were methods for phylogenetic analysis and population genetics using sequence data:

We are hosting the meeting Statistical Methods for Postgenomic Data (SMPGD 2017) at Imperial College London. Previously, we helped organizing the life science session at GOCPS 2010 and the Computational Systems Biology (WCSB 2008) conference at the University of Leipzig. In Munich our group was coorganizer of the workshop Complex Stochastic Systems in Biology and Medicine 2004.