Back

Development and validation of a machine learning model for community-based tuberculosis screening among persons aged >= 15 years in South Africa and Zambia

Zimmer, A. J.; Loharja, H.; Fentahun Muchie, K.; Koeppel, L.; Ayles, H.; Castro, M. d. M.; Christodoulou, E.; Fox, G. J.; Gaeddert, M.; Hamada, Y.; Isaacs, C.; Kapata, N.; Chanda-Kapata, P.; Karimi, K.; Kasese, N.; Kerkhoff, A.; Law, I.; Maier-Hein, L.; Marx, F. M.; Maimbolwa, M. M.; Moyo, S.; Mthiyane, T.; Muyoyeta, M.; Rocklöv, J.; Schaap, A.; Yerlikaya, S.; Opata, M.; Denkinger, C. M.

2026-04-04 public and global health

10.64898/2026.03.30.26349632 medRxiv

Show abstract

Introduction: Current tuberculosis (TB) screening tools, such as the WHO four-symptom screen (W4SS), lack sufficient sensitivity and specificity for effective community-based active case finding, contributing to both missed diagnoses and unnecessary diagnostic evaluations. This study aimed to develop and validate a machine learning (ML) model to improve TB risk prediction among persons aged >=15 years in community settings of Zambia and South Africa. Methods: A large, harmonized dataset was created from four community-based TB prevalence surveys in South Africa and Zambia (N=169,813), restricted to individuals not under treatment at the time of survey. A binary reference outcome was defined based on available microbiological and radiographic data, grouping individuals as either 'Possible TB' or 'Unlikely TB'. An XGBoost model was trained on 80% (N=135,854) of the data using demographic, clinical, and socio-economic variables, and model interpretability was assessed using SHapley Additive exPlanations (SHAP) values. Internal validation was performed using a 20% hold-out test set (N=33,959). Model performance was assessed using discrimination, calibration, and clinical utility measures compared to the W4SS and against WHO's 2025 Target Product Profile (TPP) for a tool in a two-step screening algorithm. Results: Overall, 16,413 (9.7%) of individuals were labelled as 'Possible TB'. On the test set, the XGBoost model yielded an area under the curve (AUC) of 79.7% (95% CI: 78.7, 80.7), outperforming the W4SS (AUC 57.0%; 95% CI: 56.1, 57.8). The XGBoost model achieved 81.5% sensitivity (95% CI: 77.6, 84.9) at a 60% specificity threshold. This exceeded the W4SS, which achieved only 38.2% sensitivity (95% CI: 36.5, 39.9) on the same dataset. SHAP analysis identified age, previous TB treatment, times treated for TB and unemployment as the primary contributors to risk. Conclusion: The ML XGBoost model shows promise as a screening tool to support community-based active case finding activities prior to diagnostic testing. However, as performance remained below TPP targets, and adding variables, e.g. on geolocation, could be considered.

Development and validation of a machine learning model for community-based tuberculosis screening among persons aged >= 15 years in South Africa and Zambia

Matching journals