Back

Class imbalance correction in artificial intelligence models leads to miscalibrated clinical predictions: a real-world evaluation

Roesler, M. W.; Wells, C.; Schamberg, G.; Gao, J.; Harrison, E.; O'Grady, G.; Varghese, C.

2026-03-05 health informatics
10.64898/2026.03.04.26347634 medRxiv
Show abstract

BackgroundPredictive models employing machine learning algorithms are increasingly being used in clinical decision making, and improperly calibrated models can result in systematic harm. We sought to investigate the impact of class imbalance correction, a commonly applied preprocessing step in machine learning model development, on calibration and modelled clinical decision making in a large real-world context. MethodsA histogram boosted gradient classifier was trained on a highly imbalanced national dataset of >1.8 million patients undergoing surgery, to predict the risk of 90-day mortality and complications after surgery. Class imbalance correction strategies including random oversampling, synthetic minority oversampling technique, random under-sampling, and cost-sensitive learning were compared to the natural distribution ( natural). Models were tested and compared with classification metrics, calibration plots, decision curve analysis, and simulated clinical impact analysis. ResultsThe natural model demonstrated high performance (AUROC 0.94, 95% CI 0.94-0.95 for mortality; 0.84, 95% CI 0.84-0.85 for complications) and calibration (log loss 0.05, 95% CI 0.04-0.05 for mortality; 0.23, 95% CI 0.23-0.24 for complications). Class imbalance mitigation (CSL, ROS, RUS, and SMOTE) did not improve AUROC or AUPRC but increased recall and F1 scores at the expense of precision and accuracy. However, these methods severely compromised model calibration, leading to significant over-prediction of risks (up to a 62.8 % increase) as further evidenced by increased log loss across all mitigation techniques. Decision curve analysis and clinical scenario testing confirmed that the natural model provided the highest net benefit. ConclusionClass imbalance correction methods result in significant miscalibration, leading to possible harm when used for clinical decision making.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.1%
18.1%
2
JMIR Medical Informatics
17 papers in training set
Top 0.1%
9.8%
3
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
8.2%
4
International Journal of Medical Informatics
25 papers in training set
Top 0.2%
6.2%
5
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
6.2%
6
Scientific Reports
3102 papers in training set
Top 20%
6.2%
50% of probability mass above
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.9%
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.1%
3.9%
9
PLOS ONE
4510 papers in training set
Top 41%
3.5%
10
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.5%
11
npj Digital Medicine
97 papers in training set
Top 1%
3.0%
12
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.5%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 1.0%
2.3%
14
JAMIA Open
37 papers in training set
Top 0.8%
1.7%
15
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
16
Informatics in Medicine Unlocked
21 papers in training set
Top 0.7%
1.1%
17
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
18
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
0.9%
19
JAMA Network Open
127 papers in training set
Top 3%
0.9%
20
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.9%
0.8%
21
Frontiers in Digital Health
20 papers in training set
Top 1%
0.8%
22
European Journal of Cancer
10 papers in training set
Top 0.6%
0.7%
23
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.5%
0.7%
24
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
25
Clinical Infectious Diseases
231 papers in training set
Top 5%
0.7%
26
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%