Back

Performance of IBD machine learning classifiers varies across microbiome training data independent of geographic diversity

Wolf, A.; Cirolia, G.; Gustafson, J. T.; Aswani, A.

2026-05-22 microbiology
10.64898/2026.05.21.727052 bioRxiv
Show abstract

Microbiome-based machine learning classifiers show increasing promise for disease identification across gastrointestinal, metabolic, and immune-mediated conditions. Inflammatory bowel disease (IBD), a chronic immune-mediated disorder associated with disruption of the gut microbiome, has been a particularly successful application area. However, while many predictive models achieve high performance within individual datasets, their ability to generalize across independent populations and geographic contexts remains unclear. Here, we tested whether model class and training dataset composition influence model generalizability across geographically diverse evaluation studies. We compiled seven publicly available shotgun metagenomic studies spanning five geographic regions, comprising 697 individuals with IBD or healthy controls. We trained 246,986 model configurations across seven model classes and five distinct training dataset combinations and evaluated top-performing models on independent studies from the USA, Ireland, Germany, Israel and China Extreme gradient boosting and random forest models showed the highest and most consistent performance across training datasets, a ranking that was maintained on independent evaluation studies. However, models trained on geographically diverse datasets did not outperform those trained on USA-only datasets. Instead, model performance was strongly dependent on the evaluation study itself, with consistent differences in achievable accuracy across studies. Despite most models achieving similar AUC scores, there was limited overlap in the key microbial species identified. Furthermore, even for the small set of disease predictive microbes shared between models, the direction of enrichment between IBD or healthy subjects often varied in opposing directions across study populations. These findings suggest that study-specific factors constrain generalization and may help explain the lack of consistent microbiome-based biomarkers for IBD. ImportanceMachine learning models based on the human gut microbiome are increasingly proposed as diagnostic tools for inflammatory bowel disease, but our findings suggest that identifying reliable microbiome biomarkers poses a challenge. Models trained on different datasets often selected different species as important predictors, even when diagnostic performance was similar, indicating that disease-associated microbes may depend strongly on the patient populations studied. Even species repeatedly selected across training datasets frequently showed inconsistent associations with disease, helping explain low agreement across microbiome studies. Importantly, models performed well across new patient groups independent of the geographic diversity present in the training datasets. By identifying microbial species repeatedly selected across datasets, model types, and evaluation studies, we identified a smaller group of more consistent biomarkers, including enrichment of Klebsiella pneumoniae and Erysipelatoclostridium ramosum and depletion of Lachnospiraceae and Alistipes species, which may represent stronger candidates for transferable microbiome markers.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Gut Microbes
70 papers in training set
Top 0.1%
12.3%
2
mSystems
361 papers in training set
Top 0.7%
10.1%
3
Microbiome
139 papers in training set
Top 0.3%
8.4%
4
mSphere
281 papers in training set
Top 0.6%
6.3%
5
Cell Host & Microbe
113 papers in training set
Top 1%
4.9%
6
mBio
750 papers in training set
Top 3%
4.9%
7
eLife
5422 papers in training set
Top 20%
4.3%
50% of probability mass above
8
Scientific Reports
3102 papers in training set
Top 34%
3.7%
9
Nature Communications
4913 papers in training set
Top 40%
3.6%
10
npj Biofilms and Microbiomes
56 papers in training set
Top 0.7%
2.6%
11
Genome Medicine
154 papers in training set
Top 4%
1.9%
12
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
13
Microbial Genomics
204 papers in training set
Top 1%
1.7%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
15
PLOS ONE
4510 papers in training set
Top 53%
1.7%
16
The Journal of Infectious Diseases
182 papers in training set
Top 3%
1.3%
17
PLOS Biology
408 papers in training set
Top 12%
1.3%
18
Frontiers in Microbiology
375 papers in training set
Top 7%
1.2%
19
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 4%
1.1%
20
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
21
Journal of Clinical Microbiology
120 papers in training set
Top 1%
0.9%
22
Cell Reports Medicine
140 papers in training set
Top 7%
0.9%
23
eBioMedicine
130 papers in training set
Top 3%
0.9%
24
Cell Reports
1338 papers in training set
Top 32%
0.8%
25
BMC Biology
248 papers in training set
Top 4%
0.8%
26
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.7%
27
Genome Biology
555 papers in training set
Top 7%
0.7%
28
Journal of Infection
71 papers in training set
Top 3%
0.7%
29
Cell
370 papers in training set
Top 17%
0.7%
30
Science Translational Medicine
111 papers in training set
Top 7%
0.7%