ChatDIA: A zero-shot large language model workflow for targeted analysis of data-independent acquisition mass spectrometry data

Li, J.; Charkow, J.; Gao, M.; Li, J.; Rost, H.

2026-02-13 bioinformatics

10.64898/2026.02.11.705360 bioRxiv

Show abstract

Data-independent acquisition (DIA) proteomics enables reproducible, large-scale protein identification and quantification but remains challenging to analyze due to highly complex MS/MS spectra and chromatographic interference, particularly in low signal-to-noise single-cell proteomics. Here, we introduce ChatDIA, a zero-shot large language model (LLM)-based workflow for targeted DIA analysis that operates through an explicit reasoning-based decision framework. ChatDIA performs automated peptide identification and supports natural-language interaction with DIA data. Unlike purpose-built DIA software that relies on domain-specific models, ChatDIA employs general-purpose LLMs in a zero-shot setting to reason directly over extracted ion chromatograms and generate human-interpretable rationales for each decision. On an expert-annotated Streptococcus pyogenes DIA benchmark dataset, ChatDIA achieved 96.9% accuracy, matching the domain-specific state-of-the-art software DIA-NN (95.5%). In a challenging single-cell HEK-293T DIA proteomics dataset, ChatDIA further demonstrated excellent performance, achieving a lower risk-coverage area under the curve than DIA-NN (0.06 vs. 0.12) and identifying 17.5% and 45.25% of library peptides at 1% and 5% false discovery rate, respectively, compared with 16.25% and 48% for DIA-NN. Together, these results demonstrate that zero-shot LLM reasoning can competitively automate core targeted DIA decision-making while providing transparent, inspectable rationales that enable conversational, interactive validation and data exploration in noisy proteomics applications.

ChatDIA: A zero-shot large language model workflow for targeted analysis of data-independent acquisition mass spectrometry data

Matching journals