Back

TB-Bench: A Systematic Benchmark of Machine Learning and Deep Learning Methods for Second-Line TB Drug Resistance Prediction

VP, B.; Jaiswal, S.; Meshram, A.; PVS, D.; S C, S.; Narayanan, M.

2026-04-13 bioinformatics
10.64898/2026.04.08.717138 bioRxiv
Show abstract

Drug-resistant tuberculosis (TB), characterized by prolonged treatment regimens and suboptimal treatment outcomes, remains a major obstacle to global TB elimination. Advances in sequencing technologies have enabled the development of machine-learning (ML) approaches, including deep-learning (DL) methods, to predict drug resistance directly from genomic data. However, a significant gap remains in translating these advances into clinical practice. While current approaches reliably predict resistance to first-line drugs, they show consistently lower and more variable performance for second-line drugs compared with traditional drug-susceptibility testing. To characterize these limitations and assess practical utility, we conducted a comprehensive survey and standardized benchmarking of current approaches for predicting TB drug resistance using whole-genome sequencing (WGS) data. Using systematic selection criteria, we identified 20 traditional ML and DL models from 8 studies and evaluated drug-specific versions across 14 second-line drugs within a unified framework. To account for methodological heterogeneity, the models were evaluated using three distinct feature sets reflecting variability in input representations. We trained and evaluated the models on different subsets of the WHO dataset, comprising 50,801 samples, and assessed generalizability using an external validation dataset comprising 1,199 samples. In the internal evaluation on the held-out WHO test dataset, traditional ML models using binary features achieved higher predictive performance than DL models. For example, XGBoost achieved the highest area under the precision-recall curve (PRAUC) scores (46%-93%) for 10 of the 14 drugs. However, performance varied substantially across drugs. Notably, the superior performance of traditional ML models -- even with limited feature sets -- highlights their applicability in low-resource settings. When evaluated on the external validation dataset, the performance of traditional ML and DL models was comparable, and neither class of models demonstrated substantial improvement over catalogue-based approaches, underscoring challenges in cross-dataset generalization. Overall, this benchmarking study provides a comprehensive and systematic evaluation of current approaches, establishes a rigorous evaluation framework for future comparisons, and identifies key methodological considerations necessary to advance robust drug resistance prediction in clinical settings. To enhance reproducibility and facilitate the application of TB-Bench to additional datasets and models, we have made the source code publicly available at https://github.com/BIRDSgroup/TB-Bench.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 9%
8.5%
2
Genome Medicine
154 papers in training set
Top 0.6%
8.3%
3
Nature Communications
4913 papers in training set
Top 26%
6.9%
4
The Lancet Microbe
43 papers in training set
Top 0.1%
6.4%
5
eLife
5422 papers in training set
Top 17%
4.9%
6
Journal of Clinical Microbiology
120 papers in training set
Top 0.4%
4.9%
7
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
8
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
9
Cell Systems
167 papers in training set
Top 5%
2.8%
50% of probability mass above
10
PLOS ONE
4510 papers in training set
Top 44%
2.8%
11
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
12
Bioinformatics
1061 papers in training set
Top 6%
2.1%
13
Cell Reports Medicine
140 papers in training set
Top 3%
2.1%
14
Clinical Infectious Diseases
231 papers in training set
Top 2%
1.9%
15
Cell Genomics
162 papers in training set
Top 3%
1.9%
16
Communications Biology
886 papers in training set
Top 6%
1.9%
17
Microbial Genomics
204 papers in training set
Top 1.0%
1.9%
18
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
19
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 0.5%
1.5%
20
The Journal of Infectious Diseases
182 papers in training set
Top 4%
1.1%
21
eBioMedicine
130 papers in training set
Top 3%
1.0%
22
Communications Medicine
85 papers in training set
Top 0.6%
1.0%
23
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
24
PLOS Biology
408 papers in training set
Top 18%
0.8%
25
Genome Research
409 papers in training set
Top 4%
0.8%
26
BMC Genomics
328 papers in training set
Top 6%
0.8%
27
European Respiratory Journal
54 papers in training set
Top 2%
0.7%
28
mBio
750 papers in training set
Top 12%
0.7%
29
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
30
JAC-Antimicrobial Resistance
13 papers in training set
Top 0.5%
0.6%