Open Source Tools for Mass Spectrometry Analysis
(back to index)
Overview:
This is a list intended to facilitate comparison of open source software for analyzing mass spectrometry data. The list comprises R packages and some other software and contains links to the home pages and a short description of the respective features.
Please email Sebastian Gibb if there are any inaccuracies, or to suggest additional packages.
R packages
CRAN (http://cran.r-project.org)
| Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
| MALDIquant | GPL (>=3) | 1.1 | raw data (mass and intensity); mzXML format (via readMzXmlData); Bruker *flex-series format (via readBrukerFlexData) | SNIP, Convex Hull, Moving Median | local maxima over SNR*MAD | intensity transformation and smoothing; total-ion-current calibration | first landmark peaks are identified that occur in most spectra and subsequently, a warping function is computed for each spectrum by fitting a local regression to the matched reference peaks | NA | peak labeling, diverse plots for calibrated mass spectra and peaks, merge technical replicates, peak filtering, intensity matrix creation | Sebastian Gibb |
| caMassClass | GPL3 | 1.9 | CSV, mzXML | see PROcess | see PROcess, additional: faster (uses C), different use of SNR, no AUC | adjust peak high to min-max: min=0, max=1 for each spectrum; avr-std: mean=0, unit variance; med-mad: median=0, unit median absolute deviation | based on Peakminer algorithm (Virginia Prostate Center), bins peaks with similarly mass (±constant value) | LogitBoost from caTools; lda and qda from MASS; rpart from rpart | Jarek Tuszynski | |
| msProcess | GPL2 | 1.0.6 | Ciphergen XML | determine local minima and apply one of the following R functions: loess.smooth (default), spline, supsmu, approx, cummin, msSmoothMRD (msProcess, wavelet based) | local maxima; local maxima higher than estimated background (msPeaksSearch); continous wavelets; discrete wavelets | TIC; Standard Normal Variate (SNV) transformation | hierachical clustering; cluster by distance (smaller than threshold); vote; mrd (for details see msProcess documentation) | NA | in silico spectrometer; a lot of denoising functions; additional data packages: msBreast, msDilution, msProstate | Lixin Gong, William Constantine, Yu Alex Chen |
| Peaks | LGPL | 0.2 | vector of intensities | SNIP | gaussian deconvolution | NA | NA | NA | Miroslav Morhac | |
| pkDACLASS | LGPL | 1.0 | 2-column dataframe | see PROcess | monoisotopic peak detection (poisson-distribution+EM-algorithm) | NA | round non-integers mass to integer and using decimal fraction to weight their intensity | using randomForest | contains some datasets | Juliet Ndukum, Mourad Atlas, Susmita Datta |
| rTOFsPRO | GPL (>= 2) | 1.4.1 | lists generated by WMBrukerParser | estimate baseline by a linear, exponential or gaussian model; substract a constant value | peak detection on the average spectrum (to use high-precision peak detection you have to contact the authors) | smoothing (moving average) | align peaks against peak list of the average spectrum (to use a global align+binning you have to contact the authors) | NA | everything is controlled by text files => very difficult interface | Dariya Malyarenko, Maureen Tracy, William Cooke |
Bioconductor Mass Spectrometry Packages (http://bioconductor.org/packages/release/BiocViews.html#__MassSpectrometry)
| Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
| MassSpecWavelet | LGPL (>= 2) | 1.22.0 | vector of intensities | NN; additional Savitzky-Golay Algorithm | continuous wavelet | NA | NA | NA | Pan Du, Warren Kibbe, Simon Lin | |
| MSnbase | Artistic-2.0 | 1.4 | ispy2, mgf, netCDF, mzData, mzML, mzXML (mostly via mzR) | NA | NA | sum, max, quantiles, vsn | NA | NA | a lot of annotations are possible; methods for cleaning spectra | Laurent Gatto |
| PROcess | Artistic-2.0 | 1.32 | 2-column matrix (cols: 1st: m/z 2nd: intensities) | local minimum (or user-defined quantile) + loess | local maximum (optional smoothing (moving average)) | median of all TIC | intersection graphs | intersection graphs | Xiaochun Li | |
| TargetSearch | GPL (>= 2) | 1.12 | NetCDF | divide spectrum in subparts, calculate standard deviation, a user-definied percentage above the standard deviations become true signal | smooth spectrum, determine sign changes; using PPC | median of all TIC; none | NA | NA | GC/MS | Alvaro Cuadros-Inostroza, Jan Lisec, Henning Redestig , Matt Hannah |
| xcms | GPL (>= 2) | 1.32 | NetCDF/mzXML/mzData/mzML files | constant threshold; Savitzky-Golay Algorithm or no one (depending on PD method) | centroid base wavelet (for LC/MS); continuous wavelet (using MassSpecWavelet, for MS) | NA | construct a master peak list and align by best match; heuristically clustering | NA | database search possible; write support for mzData and NetCDF; a lot of functions for LC/MS | Colin A. Smith, Ralf Tautenhahn , Steffen Neumann , Paul Benton |
Other R Packages
| Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
| PPC | GPL2 | 1.02 | CSV | NN | local maximum above noise estimated by Friedman's super smoother | log-transformation + linear transformation (10th percentile becomes 0; 90th, 1) | hierarchical clustering | nearest shrunken centroids (PPC) | Balasubramanian Narasimhan, R. Tibshirani, T. Hastie |
R Packages for importing mass spectrometry data files
| Package | License | Version | File Formats | Miscellaneous | Authors |
| mzR | Artistic-2.0 | 1.2.1 | mzXML, mzData, mzML, NetCDF | Bernd Fischer, Steffen Neumann, Laurent Gatto | |
| readBrukerFlexData | GPL | 1.3 | fid files of Bruker Datlonics' *flex series | Sebastian Gibb | |
| readMzXmlData | GPL | 2.3 | mzXML | Sebastian Gibb |
Non-R tools
| Application | Programming Language | Operating Systems | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
| mMass | Python | L, M, W | GPL3 | 5.0.1 | mzData, mzXML, mzML, ASCII, CSV, fid (Bruker Daltonics' compassXport has to be installed (W only)) | median of all intensities minus median of absolute deviations (additional you can add a relative offset and smooth the baseline); gaussian smoothing | local maximum above (relative and absolute) intensity threshold | intensity*1/max_intensity, (range: 0-1) | NA | NA | deisotoping function, connections to a lot of protein databases, batch processing, please see also: complete feature list | Martin Strohalm |
| MZmine2 | Java | L, M, W | GPL2 | Matej Orešič et al (full list) | ||||||||
| OpenChrom | Java | L, M, W | EPL | 0.6 | NetCDF, mzXML, CSV, D (Agilent Technologies), own file format *.chrom | moving minimum | zero of first derivation of TIC signal | NA | NA | NA | batch processing; smoothing filter: Savitzky-Golay; extendable by plugins; database based identification possible (as plugin, NIST-DB); | Philip Wenig |
| OpenMS/TOPP | C++ | L, M, W | LGPL | Knut Reinert, Oliver Kohlbacher, Andreas Hildebrandt and many others (full list) |
Non-R tools for importing mass spectrometry data files
| Application | License | Version | File Formats | Miscellaneous | Authors |
| pymzML | LGPL | 0.7.4 | mzML | Python 2.6.5/Python 3 | Till Bald, Johannes Barth, Anna Niehues, Michael Specht, Michael Hippler, Christian Fufezan |
Abbreviations:
| AUC | area under the curve |
| BC | baseline correction |
| CL | classification |
| DN | denoising |
| PA | peak alignment |
| PD | peak detection |
| SNR | signal to noise ratio |
| TIC | total ion current/total ion count |
| GC/MS | gas chromatography/mass spectrometry |
| LC/MS | liquid chromatography/mass spectrometry |
| MS | mass spectrometry |
| NA | not available |
| NN | not needed |
| L | Linux |
| M | Mac OS X |
| W | Microsoft Windows |
Last modified:
2012-05-06