Back

An AI-Driven Decision-Support Tool for Triage of COVID-19 Patients Using Respiratory Microbiome Data

Avina-Bravo, E. G.; Garcia-Lorenzo, I.; Alfaro-Ponce, M.; Breton-Deval, L.

2026-03-19 bioinformatics
10.64898/2026.03.18.712739 bioRxiv
Show abstract

Accurate clinical triage is critical for optimizing decision-making and resource allocation during infectious disease outbreaks such as COVID-19. In this study, we present an AI-driven decision-support tool for the triage of COVID-19 patients based on respiratory microbiome profiles derived from shotgun metagenomic sequencing. We analyzed 477 shotgun respiratory metagenomes from three independent public cohorts and generated genus-level taxonomic profiles, which were integrated with minimal clinical metadata to train supervised machine-learning models, including Random Forest, Support Vector Machine, and XGBoost. Model performance was evaluated using standard classification metrics, cross-validation, and particle swarm optimization for hyperparameter tuning. Across cohorts, we observed a consistent transition from microbiomes dominated by commensal taxa to dysbiotic states enriched in opportunistic and clinically relevant genera, particularly Acinetobacter and Staphylococcus, in severe and deceased patients. Among the evaluated models, XGBoost consistently achieved the best performance, reaching up to 96.1% accuracy, 97.6% F1-score, and 98.2% ROC-AUC in individual cohorts. When trained on the integrated dataset, XGBoost maintained robust performance (95.1% accuracy, 97.2% F1-score, 94.3% ROC-AUC) and demonstrated greater stability and lower variance compared to alternative models. Feature-importance analyses identified a compact and interpretable set of recurrent microbial predictors, and reduced-feature models retained substantial discriminative power when augmented with key clinical variables. These results support the respiratory microbiome as a valuable source of information for outcome-oriented clinical triage and position microbiome-informed machine learning as a scalable and interpretable decision-support approach for managing COVID-19 and future infectious disease scenarios.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Microbiome
139 papers in training set
Top 0.1%
14.0%
2
Nature Communications
4913 papers in training set
Top 15%
12.0%
3
Genome Medicine
154 papers in training set
Top 1%
6.2%
4
Advanced Science
249 papers in training set
Top 4%
4.7%
5
Scientific Reports
3102 papers in training set
Top 29%
4.2%
6
Nature Machine Intelligence
61 papers in training set
Top 0.8%
3.9%
7
Genome Biology
555 papers in training set
Top 2%
3.9%
8
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
50% of probability mass above
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
10
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
11
mSystems
361 papers in training set
Top 3%
3.2%
12
Cell Systems
167 papers in training set
Top 5%
2.5%
13
Cell Reports Medicine
140 papers in training set
Top 3%
1.8%
14
Nature Biotechnology
147 papers in training set
Top 5%
1.6%
15
Microbial Genomics
204 papers in training set
Top 1%
1.6%
16
PLOS ONE
4510 papers in training set
Top 56%
1.6%
17
Nature Microbiology
133 papers in training set
Top 3%
1.4%
18
Bioinformatics
1061 papers in training set
Top 8%
1.4%
19
Cell Reports Methods
141 papers in training set
Top 3%
1.3%
20
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
21
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.9%
22
iScience
1063 papers in training set
Top 25%
0.9%
23
Frontiers in Microbiology
375 papers in training set
Top 8%
0.9%
24
Nature Medicine
117 papers in training set
Top 4%
0.9%
25
mSphere
281 papers in training set
Top 6%
0.8%
26
Communications Biology
886 papers in training set
Top 22%
0.8%
27
GigaScience
172 papers in training set
Top 3%
0.8%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
29
eLife
5422 papers in training set
Top 59%
0.7%
30
Patterns
70 papers in training set
Top 3%
0.7%