Back

Calibrated and Interpretable Machine Learning for ICU Mortality Prediction Using First 24-Hour Clinical Data

Alsammani, A.; Johnson, M.; Elrefaei, J.

2026-06-02 health informatics
10.64898/2026.05.30.26354524 medRxiv
Show abstract

Objective: To develop, calibrate, and interpret machine learning models for predicting in-hospital mortality among intensive care unit (ICU) patients using clinical data collected during the first 24 hours of admission. Methods: We analyzed 53,866 adult ICU admissions from the MIMIC-IV (v2.2) database, including 5,787 in-hospital deaths (10.7%). An enhanced feature-engineering pipeline generated 88 laboratory-based features that captured distributional characteristics, temporal trends, and measurement frequency. Five machine learning classifiers were evaluated: L2-regularized logistic regression, random forest, XGBoost, LightGBM, and a calibrated soft-voting ensemble. Models were developed using a stratified 64:8:8:20 split for training, validation and hyperparameter tuning, calibration, and testing. Performance was assessed on a held-out test set (n = 10,774) using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), Brier score, calibration analysis, decision curve analysis (DCA), and SHAP-based model interpretation. Results: The calibrated ensemble achieved the best overall performance, with an AUROC of 0.856 (95% CI: 0.846-0.867), an AUPRC of 0.449 (95% CI: 0.418-0.480), and a Brier score of 0.078. XGBoost (AUROC 0.856; AUPRC 0.435) and LightGBM (AUROC 0.854; AUPRC 0.436) demonstrated performance comparable to the ensemble and significantly outperformed logistic regression (AUROC 0.823; AUPRC 0.376), yielding absolute AUROC improvements of approximately 0.031-0.033 (p < 0.001). Calibration substantially improved probabilistic predictions, reducing Brier scores by 42% for XGBoost (0.134 to 0.078) and 50% for LightGBM (0.151 to 0.076). Decision curve analysis demonstrated consistent net clinical benefit across the 5%-20% risk-threshold range. Key predictors included age, blood urea nitrogen, ICU subtype, measurement frequency, and lactate-related features. Model performance remained robust across ICU subtypes, with AUROC values exceeding 0.79. Conclusion: A calibrated and interpretable machine learning framework based on early ICU clinical data provides accurate and clinically actionable mortality risk estimates. By integrating trajectory-aware feature engineering, probabilistic calibration, and decision-analytic evaluation, this approach advances ICU mortality prediction toward more reliable and trustworthy clinical decision support systems.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.4%
14.0%
2
Scientific Reports
3102 papers in training set
Top 4%
12.2%
3
JMIR Medical Informatics
17 papers in training set
Top 0.1%
8.2%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.3%
7.0%
5
Journal of Medical Internet Research
85 papers in training set
Top 1%
4.1%
6
Critical Care Explorations
15 papers in training set
Top 0.1%
3.5%
7
PLOS ONE
4510 papers in training set
Top 40%
3.5%
50% of probability mass above
8
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.5%
9
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.5%
10
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
3.0%
11
The Lancet Digital Health
25 papers in training set
Top 0.2%
3.0%
12
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.5%
13
PLOS Digital Health
91 papers in training set
Top 1%
2.0%
14
JAMIA Open
37 papers in training set
Top 0.7%
1.8%
15
Critical Care
14 papers in training set
Top 0.3%
1.4%
16
eBioMedicine
130 papers in training set
Top 2%
1.4%
17
Frontiers in Digital Health
20 papers in training set
Top 0.8%
1.4%
18
Frontiers in Medicine
113 papers in training set
Top 4%
1.3%
19
BMC Medical Research Methodology
43 papers in training set
Top 0.9%
1.2%
20
Journal of Infection
71 papers in training set
Top 2%
1.2%
21
JAMA Network Open
127 papers in training set
Top 3%
0.9%
22
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
23
European Respiratory Journal
54 papers in training set
Top 2%
0.9%
24
Nature Communications
4913 papers in training set
Top 62%
0.8%
25
BMJ Health & Care Informatics
13 papers in training set
Top 0.9%
0.8%
26
BMC Medicine
163 papers in training set
Top 7%
0.7%
27
European Heart Journal - Digital Health
15 papers in training set
Top 0.6%
0.7%
28
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
29
Annals of Internal Medicine
27 papers in training set
Top 1%
0.7%