Back

MetaXtract: Extracting Metadata from Raw Files for FAIR Data Practices and Workflow Optimisation

Lutfi, A.; Chen, Z. A.; Fischer, L.; Rappsilber, J.

2026-03-16 bioinformatics
10.1101/2025.11.12.687968 bioRxiv
Show abstract

Mass spectrometry (MS) experiments generate rich acquisition metadata that are essential for reproducibility, data sharing, and quality control (QC). Because these metadata are typically stored only in vendor-specific formats, they often remain difficult to access. MetaXtract is a lightweight tool that extracts detailed parameters directly from Thermo Fisher raw files and exposes them in structured, tabular formats. By capturing sample information, LC-MS method settings, and scan-level metrics such as retention time, total ion current, and ion injection time, MetaXtract increases transparency and ensures that essential acquisition details accompany published data and results in easy readable form. This supports FAIR data practices by improving the findability, accessibility, interoperability, and reusability of MS datasets after converting them to other formats, thereby increasing the value of deposition in public repositories. The importance of such metadata accessibility was recently highlighted by the crosslinking mass spectrometry community in efforts to advance FAIR data principles, and it extends to MS-based omics approaches more broadly. Importantly, MetaXtract enables search-free, near real-time performance monitoring by relying on acquisition-side signals, providing actionable indicators immediately after data acquisition rather than after database searching. This also caters for laboratory or depository internal streamlined QC and troubleshooting through integration into automated pipelines. By embedding acquisition parameters into routine data handling, MetaXtract strengthens reproducibility, optimises method development, and supports large-scale applications, including machine learning and secondary data analysis. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=195 HEIGHT=200 SRC="FIGDIR/small/687968v2_ufig1.gif" ALT="Figure 1"> View larger version (23K): org.highwire.dtl.DTLVardef@d835e6org.highwire.dtl.DTLVardef@babfaforg.highwire.dtl.DTLVardef@7e9d69org.highwire.dtl.DTLVardef@907993_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIMetadata extraction from Thermo Fisher raw files C_LIO_LIEnhanced findability, accessibility, interoperability, and reusability of deposited data C_LIO_LIIntegration into workflows via GUI and command-line modes C_LIO_LITroubleshooting support by visualizing MS1/MS2 scan details C_LIO_LIIndexed MS1/MS2 peak list export enabling machine learning workflows C_LI AvailabilityMetaXtract is available for free download as open-source software at https://github.com/Rappsilber-Laboratory/MetaXtract, the software is licensed under the Apache-2.0 license.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.