Back

Observation-process features are associated with larger domain shift in sepsis mortality prediction: a cross-database evaluation using MIMIC-IV and eICU-CRD

Yamamoto, R.; Wu, F.; Sprehe, L. K.; Abeer, A.; Celi, L. A.; Tohyama, T.

2026-04-06 intensive care and critical care medicine
10.64898/2026.04.05.26350209 medRxiv
Show abstract

Clinical prediction models for sepsis frequently degrade when applied outside the development setting. Electronic health record data encode not only patient physiology but also observation processes such as measurement timing and frequency, which may be predictive within a site but unstable across sites. The contribution of these observation-process features to cross-site performance degradation has not been quantified. In this retrospective cohort study, we developed models for in-hospital mortality in adult intensive care unit (ICU) patients meeting Sepsis-3 criteria using Medical Information Mart for Intensive Care IV (MIMIC-IV) (n = 30,218; 16.3% mortality) and externally validated them in eICU Collaborative Research Database (eICU-CRD) (n = 31,403; 13.9% mortality). We compared seven prespecified model specifications representing physiologic summary strategies (a single aggregate severity score, most recent values, extreme values, and within-window variability), each evaluated with and without measurement counts as observation-process features. Models were fit using logistic regression and gradient-boosted trees. Internally, discrimination improved with more detailed physiologic summaries and measurement counts (logistic regression area under the receiver operating characteristic curve [AUROC] from 0.819 to 0.834). In external validation, performance drops were larger for specifications using more complex physiologic representations. Adding measurement counts was associated with larger domain shift (AUROC change, -0.047 versus -0.082 with counts in logistic regression). External calibration deteriorated progressively, with calibration slopes decreasing from 1.007 for the simplest model to 0.417 for the most complex specification in logistic regression. Gradient-boosted trees showed smaller incremental degradation from measurement counts but still exhibited domain shift in complex specifications. Inclusion of observation-process features in sepsis mortality prediction models was associated with improved internal discrimination but worse external calibration and transportability. These findings highlight that feature engineering decisions involve a tradeoff between internal performance and external generalizability, and that calibration assessment provides the most sensitive indicator of reduced transportability.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
25.6%
2
Scientific Reports
3102 papers in training set
Top 7%
10.0%
3
Critical Care Explorations
15 papers in training set
Top 0.1%
6.3%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.8%
50% of probability mass above
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.6%
4.1%
7
PLOS Digital Health
91 papers in training set
Top 0.8%
3.6%
8
iScience
1063 papers in training set
Top 6%
3.0%
9
Frontiers in Medicine
113 papers in training set
Top 2%
2.7%
10
eBioMedicine
130 papers in training set
Top 0.5%
2.7%
11
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
2.1%
12
PLOS ONE
4510 papers in training set
Top 49%
2.1%
13
JAMIA Open
37 papers in training set
Top 0.7%
1.9%
14
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
15
Bioinformatics
1061 papers in training set
Top 7%
1.7%
16
Clinical Chemistry
22 papers in training set
Top 0.4%
1.7%
17
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.6%
18
eLife
5422 papers in training set
Top 47%
1.3%
19
Frontiers in Physiology
93 papers in training set
Top 4%
1.3%
20
European Respiratory Journal
54 papers in training set
Top 1%
0.9%
21
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
22
Physiological Measurement
12 papers in training set
Top 0.3%
0.9%
23
Nature Medicine
117 papers in training set
Top 5%
0.7%
24
JMIR Medical Informatics
17 papers in training set
Top 2%
0.7%
25
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
26
Genome Medicine
154 papers in training set
Top 9%
0.6%
27
Archives of Clinical and Biomedical Research
28 papers in training set
Top 3%
0.6%
28
Pediatric Research
18 papers in training set
Top 0.5%
0.6%