Open Source Tools for Mass Spectrometry Analysis

(back to index)

 

Overview:

This is a list intended to facilitate comparison of open source software for analyzing mass spectrometry data. The list comprises R packages and some other software and contains links to the home pages and a short description of the respective features.

Please email Sebastian Gibb if there are any inaccuracies, or to suggest additional packages.

R packages

CRAN (http://cran.r-project.org)

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
MALDIquant GPL (>=3) 1.1 raw data (mass and intensity); mzXML format (via readMzXmlData); Bruker *flex-series format (via readBrukerFlexData) SNIP, Convex Hull, Moving Median local maxima over SNR*MAD intensity transformation and smoothing; total-ion-current calibration first landmark peaks are identified that occur in most spectra and subsequently, a warping function is computed for each spectrum by fitting a local regression to the matched reference peaks NA peak labeling, diverse plots for calibrated mass spectra and peaks, merge technical replicates, peak filtering, intensity matrix creation Sebastian Gibb
caMassClass GPL3 1.9 CSV, mzXML see PROcess see PROcess, additional: faster (uses C), different use of SNR, no AUC adjust peak high to min-max: min=0, max=1 for each spectrum; avr-std: mean=0, unit variance; med-mad: median=0, unit median absolute deviation based on Peakminer algorithm (Virginia Prostate Center), bins peaks with similarly mass (±constant value) LogitBoost from caTools; lda and qda from MASS; rpart from rpart Jarek Tuszynski
msProcess GPL2 1.0.6 Ciphergen XML determine local minima and apply one of the following R functions: loess.smooth (default), spline, supsmu, approx, cummin, msSmoothMRD (msProcess, wavelet based) local maxima; local maxima higher than estimated background (msPeaksSearch); continous wavelets; discrete wavelets TIC; Standard Normal Variate (SNV) transformation hierachical clustering; cluster by distance (smaller than threshold); vote; mrd (for details see msProcess documentation) NA in silico spectrometer; a lot of denoising functions; additional data packages: msBreast, msDilution, msProstate Lixin Gong, William Constantine, Yu Alex Chen
Peaks LGPL 0.2 vector of intensities SNIP gaussian deconvolution NA NA NA Miroslav Morhac
pkDACLASS LGPL 1.0 2-column dataframe see PROcess monoisotopic peak detection (poisson-distribution+EM-algorithm) NA round non-integers mass to integer and using decimal fraction to weight their intensity using randomForest contains some datasets Juliet Ndukum, Mourad Atlas, Susmita Datta
rTOFsPRO GPL (>= 2) 1.4.1 lists generated by WMBrukerParser estimate baseline by a linear, exponential or gaussian model; substract a constant value peak detection on the average spectrum (to use high-precision peak detection you have to contact the authors) smoothing (moving average) align peaks against peak list of the average spectrum (to use a global align+binning you have to contact the authors) NA everything is controlled by text files => very difficult interface Dariya Malyarenko, Maureen Tracy, William Cooke

Bioconductor Mass Spectrometry Packages (http://bioconductor.org/packages/release/BiocViews.html#__MassSpectrometry)

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
MassSpecWavelet LGPL (>= 2) 1.22.0 vector of intensities NN; additional Savitzky-Golay Algorithm continuous wavelet NA NA NA Pan Du, Warren Kibbe, Simon Lin
MSnbase Artistic-2.0 1.4 ispy2, mgf, netCDF, mzData, mzML, mzXML (mostly via mzR) NA NA sum, max, quantiles, vsn NA NA a lot of annotations are possible; methods for cleaning spectra Laurent Gatto
PROcess Artistic-2.0 1.32 2-column matrix (cols: 1st: m/z 2nd: intensities) local minimum (or user-defined quantile) + loess local maximum (optional smoothing (moving average)) median of all TIC intersection graphs intersection graphs Xiaochun Li
TargetSearch GPL (>= 2) 1.12 NetCDF divide spectrum in subparts, calculate standard deviation, a user-definied percentage above the standard deviations become true signal smooth spectrum, determine sign changes; using PPC median of all TIC; none NA NA GC/MS Alvaro Cuadros-Inostroza, Jan Lisec, Henning Redestig , Matt Hannah
xcms GPL (>= 2) 1.32 NetCDF/mzXML/mzData/mzML files constant threshold; Savitzky-Golay Algorithm or no one (depending on PD method) centroid base wavelet (for LC/MS); continuous wavelet (using MassSpecWavelet, for MS) NA construct a master peak list and align by best match; heuristically clustering NA database search possible; write support for mzData and NetCDF; a lot of functions for LC/MS Colin A. Smith, Ralf Tautenhahn , Steffen Neumann , Paul Benton

Other R Packages

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
PPC GPL2 1.02 CSV NN local maximum above noise estimated by Friedman's super smoother log-transformation + linear transformation (10th percentile becomes 0; 90th, 1) hierarchical clustering nearest shrunken centroids (PPC) Balasubramanian Narasimhan, R. Tibshirani, T. Hastie

R Packages for importing mass spectrometry data files

Package License Version File Formats Miscellaneous Authors
mzR Artistic-2.0 1.2.1 mzXML, mzData, mzML, NetCDF Bernd Fischer, Steffen Neumann, Laurent Gatto
readBrukerFlexData GPL 1.3 fid files of Bruker Datlonics' *flex series Sebastian Gibb
readMzXmlData GPL 2.3 mzXML Sebastian Gibb

Non-R tools

Application Programming Language Operating Systems License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
mMass Python L, M, W GPL3 5.0.1 mzData, mzXML, mzML, ASCII, CSV, fid (Bruker Daltonics' compassXport has to be installed (W only)) median of all intensities minus median of absolute deviations (additional you can add a relative offset and smooth the baseline); gaussian smoothing local maximum above (relative and absolute) intensity threshold intensity*1/max_intensity, (range: 0-1) NA NA deisotoping function, connections to a lot of protein databases, batch processing, please see also: complete feature list Martin Strohalm
MZmine2 Java L, M, W GPL2 Matej Orešič et al (full list)
OpenChrom Java L, M, W EPL 0.6 NetCDF, mzXML, CSV, D (Agilent Technologies), own file format *.chrom moving minimum zero of first derivation of TIC signal NA NA NA batch processing; smoothing filter: Savitzky-Golay; extendable by plugins; database based identification possible (as plugin, NIST-DB); Philip Wenig
OpenMS/TOPP C++ L, M, W LGPL Knut Reinert, Oliver Kohlbacher, Andreas Hildebrandt and many others (full list)

Non-R tools for importing mass spectrometry data files

Application License Version File Formats Miscellaneous Authors
pymzML LGPL 0.7.4 mzML Python 2.6.5/Python 3 Till Bald, Johannes Barth, Anna Niehues, Michael Specht, Michael Hippler, Christian Fufezan

Abbreviations:

AUCarea under the curve
BCbaseline correction
CLclassification
DNdenoising
PApeak alignment
PDpeak detection
SNRsignal to noise ratio
TICtotal ion current/total ion count
GC/MSgas chromatography/mass spectrometry
LC/MSliquid chromatography/mass spectrometry
MSmass spectrometry
NAnot available
NNnot needed
LLinux
MMac OS X
WMicrosoft Windows

 

Last modified:
2012-05-06

Valid XHTML 1.1