Back

Interpretable Machine Learning for Population-Level Severe Tooth Loss Prediction: A Two-Axis External Validation

LAM, Q. T.; Fan, F.-Y.; Wang, Y.-L.; Wu, C.-Y.; Sun, Y.-S.; Vo, T. T. T.; Kuo, H.; Kha, Q. H.; Le, M. H. N.; Vu, G.; Le, N. Q. K.; Lee, I.-T.

2026-04-05 dentistry and oral medicine
10.64898/2026.04.03.26350106 medRxiv
Show abstract

Objectives: Machine learning can predict severe tooth loss (STL, 6 or more missing teeth), but opaque black-box models neglecting complex survey designs limit clinical adoption. This study developed and externally validated an intrinsically interpretable, survey-weighted framework for population-level STL prediction, capturing complex socio-behavioral and systemic health determinants. Methods: We analyzed nationally representative data from BRFSS 2022 (derivation, N=433,772), BRFSS 2024 (temporal validation, N=448,213), and the clinically examined NHANES 2015-2018 (cross-domain validation, N=10,775). Missing data were resolved using an anti-leakage HistGradientBoosting MICE pipeline, preserving multivariate epidemiological variance. An Explainable Boosting Machine (EBM, GA2M) was natively trained by integrating complex survey weights. For external clinical validation, structural domain shift was addressed through non-parametric Isotonic Regression recalibration. Results: The EBM achieved strong temporal stability on BRFSS 2024 (AUC: 0.8627; Brier Score: 0.0845). Upon cross-domain validation against NHANES 2015-2018, the calibrated model demonstrated robust transportability (AUC: 0.7504; Brier Score: 0.1358). Notably, the zero-shot EBM (AUC: 0.7591) closely matched the predictive ceiling of a black-box stacked meta-ensemble (AUC: 0.7706), eliminating the need for unstable post-hoc approximations. Fully auditable shape functions explicitly revealed non-linear risk thresholds and synergistic pairwise interactions for key predictors including age, smoking, income, and diabetes. Decision curve analysis confirmed substantial positive net clinical benefit across a 5%-50% risk threshold continuum. Conclusions: The MICE-EBM framework predicts STL with complete intrinsic transparency and robust probabilistic reliability. By successfully generalizing across unobserved temporal and clinical cohorts, this TRIPOD+AI compliant framework provides a clinically deployable tool to optimize targeted dental public health interventions.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 13%
12.5%
2
Scientific Reports
3102 papers in training set
Top 4%
12.5%
3
eLife
5422 papers in training set
Top 5%
10.5%
4
Nature Medicine
117 papers in training set
Top 0.4%
4.9%
5
Science Advances
1098 papers in training set
Top 2%
4.9%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.6%
7
PLOS ONE
4510 papers in training set
Top 42%
3.1%
50% of probability mass above
8
PLOS Computational Biology
1633 papers in training set
Top 11%
2.9%
9
Frontiers in Immunology
586 papers in training set
Top 3%
2.6%
10
npj Digital Medicine
97 papers in training set
Top 2%
2.1%
11
The Lancet Digital Health
25 papers in training set
Top 0.3%
1.9%
12
Communications Medicine
85 papers in training set
Top 0.2%
1.9%
13
International Journal of Epidemiology
74 papers in training set
Top 1%
1.8%
14
PLOS Medicine
98 papers in training set
Top 2%
1.7%
15
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
16
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
17
Frontiers in Public Health
140 papers in training set
Top 5%
1.7%
18
BMC Medicine
163 papers in training set
Top 3%
1.7%
19
Advanced Science
249 papers in training set
Top 11%
1.7%
20
American Journal of Epidemiology
57 papers in training set
Top 0.7%
1.7%
21
PLOS Biology
408 papers in training set
Top 11%
1.5%
22
Science Translational Medicine
111 papers in training set
Top 3%
1.3%
23
Communications Biology
886 papers in training set
Top 17%
1.0%
24
Journal of Dental Research
13 papers in training set
Top 0.2%
0.9%
25
Royal Society Open Science
193 papers in training set
Top 4%
0.8%
26
PNAS Nexus
147 papers in training set
Top 2%
0.8%
27
Epidemics
104 papers in training set
Top 2%
0.7%
28
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
29
European Radiology
14 papers in training set
Top 0.8%
0.7%
30
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%