AI/ML-based prediction of TB treatment failure: A systematic review and meta-analysis
Kamulegeya, R.; Nabatanzi, R.; Semugenze, D.; Mugala, F.; Takuwa, M.; Nasinghe, E.; Musinguzi, D.; Namiiro, S.; Katumba, A.; Ssengooba, W.; Nakatumba-Nabende, J.; Kivunike, F. N.; Kateete, D. P.
Show abstract
BackgroundTuberculosis (TB) remains a leading cause of infectious disease mortality worldwide, and treatment failure contributes to ongoing transmission, drug resistance, and poor clinical outcomes. Artificial intelligence and machine learning approaches have attracted growing interest for predicting tuberculosis treatment outcomes, but the literature is heterogeneous and lacks a comprehensive synthesis. MethodsWe conducted a systematic review and meta-analysis of studies that developed or validated machine learning models to predict TB treatment failure. We searched PubMed/MEDLINE and Embase from January 2000 to October 2025. Studies were eligible if they developed, validated, or implemented an artificial intelligence or machine learning model for the prediction of TB treatment failure or a closely related poor outcome in patients receiving anti-TB treatment. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool. Random-effects meta-analysis was performed to pool area under the curve values, with subgroup analyses and meta-regression to explore heterogeneity. ResultsThirty-four studies were included in the systematic review, of which 19 reported area under the curve values suitable for meta-analysis (total participants, 100,790). Studies were published between 2014 and 2025, with 91% published from 2019 onward. Tree-based methods were the most common algorithm family (52.9%), and multimodal models integrating three or more data types were used in 41.2% of studies. The pooled area under the curve was 0.836 (95% confidence interval 0.799-0.868), with substantial heterogeneity (I{superscript 2} = 97.9%). In subgroup analyses, studies including HIV-positive participants showed lower discrimination (pooled area under the curve 0.748) compared to those excluding them (0.924). Only eight studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall, primarily due to methodological concerns in the analysis domain. Eggers test suggested publication bias (p = 0.024). Major evidence gaps included underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. ConclusionsMachine learning models for predicting TB treatment failure show promising discrimination but are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and methodological limitations, including inadequate validation, poor calibration assessment, and high risk of bias, limit confidence in current estimates. Future research should prioritize rigorous external validation, calibration assessment, and development in underrepresented populations, particularly HIV-affected and high-burden settings. Author SummaryTB kills over a million people annually. While curable, treatment failure remains common and drives ongoing transmission and drug resistance. Researchers increasingly use artificial intelligence and machine learning to predict which patients will fail treatment, but it is unclear if these models are ready for clinical use. We reviewed 34 studies including nearly 1.1 million participants from 22 countries. On average, models correctly distinguished patients who would fail treatment from those who would not 84% of the time, a performance generally considered good. However, this average hid enormous variation. Models developed in populations including HIV-positive people performed substantially worse, suggesting prediction is harder with HIV co-infection. Worryingly, only one study used high-quality methods; 97% had serious flaws in handling missing data, checking calibration, or testing in new populations. Only eight studies validated their models in different settings. To conclude, we found that machine learning is promising in predicting TB treatment failure, but it is not ready for clinical use. Researchers should prioritize validation in high-burden settings, include social determinants, and improve methodological rigor before these tools can help patients.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.