MassID provides near complete annotation of metabolomics data with identification probabilities

Stancliffe, E.; Gandhi, M.; Guzior, D. V.; Mehta, A.; Acharya, S.; Richardson, A. D.; Cho, K.; Cohen, T.; Patti, G. J.

2026-02-14 bioinformatics

10.64898/2026.02.11.704864 bioRxiv

Show abstract

Liquid chromatography coupled to mass spectrometry (LC/MS) is a powerful tool in metabolomics research, generating tens-of-thousands of signals from a single biological sample. However, current software solutions for unbiased assessment of metabolomics data analysis are limited by complex sources of noise and non-quantitative metabolite identifications that make results difficult to interpret. Here, we present MassID, a cloud-based untargeted metabolomics pipeline that aims to overcome the innate challenges of unbiased metabolite analysis and perform end-to-end data processing, transforming raw spectra to normalized and identified metabolite profiles. MassID incorporates a suite of software functionalities, including deep learning-based peak detection and comprehensive noise filtering. In addition, with MassID we introduce a novel software module: DecoID2 that enables probabilistic metabolite identification for false discovery rate (FDR)-controlled metabolomics. When applied to a human plasma dataset, MassID results in near-complete signal annotation, identification of >4,000 metabolites (including >1,200 compounds at an FDR <5%) across four complementary LC/MS runs, and enables integrated downstream analyses to understand biochemical dysregulation at both the molecular and pathway level. When compared to the Metabolomics Standards Initiative (MSI) confidence levels, identification probability generally correlated with MSI levels. However, only 356/418 of MSI Level 1 compounds were identified with <5% FDR and the remaining 884 FDR < 5% compounds were identified from MSI L2-L3 compounds, highlighting the enhanced specificity and discovery potential achieved by MassID.

MassID provides near complete annotation of metabolomics data with identification probabilities

Matching journals