Back

Developing and externally validating machine learning models to forecast short-term risk of ventilator-associated pneumonia

Peltekian, A. K.; Liao, W.-T.; Guggilla, V.; Markov, N. S.; Senkow, K.; Liao, Z.; Kang, M.; Rasmussen, L. V.; Tavernier, E.; Ehrmann, S.; Clepp, R. K.; Stoeger, T.; Walunas, T.; Choudhary, A. N.; Misharin, A. V.; Singer, B. D.; Budinger, G. S.; Wunderink, R. G.; Gao, C. A.; Agrawal, A.

2026-01-30 intensive care and critical care medicine
10.64898/2026.01.28.26344858 medRxiv
Show abstract

PurposeVentilator-associated pneumonia (VAP) remains one of the most serious hospital-acquired infections in the intensive care unit (ICU), with high morbidity and mortality. Early identification of patients at risk for developing VAP could enable timely diagnostics and intervention. However, current clinical tools are limited in their ability to detect early physiologic signals preceding VAP onset. We aimed to build supervised machine learning models to predict short term onset of VAP. MethodsWe analyzed electronic health record data from a prospective observational cohort of ICU patients, where VAP was adjudicated using a standardized published protocol by a panel of critical care physicians. Clinical features (including vital signs, ventilator settings, laboratory values, and support devices) were extracted for each patient-ICU-day. We explored unsupervised clustering to characterize feature dynamics associated with VAP onset. We built multiple machine learning models across different prediction windows (3, 5, 7 days before VAP). We examined model performance in two external cohorts, MIMIC-IV and secondary analysis of the AMIKINHAL trial. Results were evaluated with discrimination metrics such as AUROC. ResultsThe internal cohort included 507 patients with BAL-confirmed diagnoses: 261 developed VAP and 246 did not have VAP. Visualization using clustering identified distinct physiologic states enriched for VAP-labeled days. The best-performing model achieved an AUROC of 0.866 in predicting VAP up to seven days before clinical diagnosis. Temporal model probability trajectories showed rising model confidence in the days leading up to VAP. On external validation in MIMIC-IV, the best model achieved an AUROC of 0.817 for forecasting VAP within five days. There was low feature overlap with the AMIKINHAL trial data, leading to poor model performance. Feature analysis revealed that platelet count, positive end-expiratory pressure (PEEP), ventilator duration, and inflammatory markers were key drivers of model predictions. ConclusionsMachine learning models trained on routinely collected ICU data with careful labeling can anticipate VAP onset up to a week in advance with strong predictive performance. Model performance generalized to data from an entirely different hospital system despite differences in practice and labeling patterns, but did not perform well when there was poor feature overlap. Future work should focus on real-time prospective evaluation.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 4%
12.3%
2
PLOS ONE
4510 papers in training set
Top 19%
10.1%
3
Critical Care Explorations
15 papers in training set
Top 0.1%
10.1%
4
European Respiratory Journal
54 papers in training set
Top 0.2%
7.2%
5
Clinical Chemistry
22 papers in training set
Top 0.1%
4.8%
6
PLOS Digital Health
91 papers in training set
Top 0.7%
3.6%
7
eBioMedicine
130 papers in training set
Top 0.4%
3.1%
50% of probability mass above
8
PLOS Computational Biology
1633 papers in training set
Top 11%
2.9%
9
Frontiers in Medicine
113 papers in training set
Top 2%
2.6%
10
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.6%
11
Critical Care
14 papers in training set
Top 0.2%
2.1%
12
EClinicalMedicine
21 papers in training set
Top 0.1%
2.1%
13
The Journal of Infectious Diseases
182 papers in training set
Top 2%
1.9%
14
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.7%
15
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
16
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
17
Physiological Measurement
12 papers in training set
Top 0.2%
1.7%
18
JMIR Medical Informatics
17 papers in training set
Top 0.7%
1.7%
19
Frontiers in Physiology
93 papers in training set
Top 3%
1.5%
20
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.3%
21
Epidemiology and Infection
84 papers in training set
Top 2%
1.1%
22
Journal of Infection
71 papers in training set
Top 2%
1.1%
23
Wellcome Open Research
57 papers in training set
Top 2%
0.9%
24
iScience
1063 papers in training set
Top 25%
0.9%
25
BMJ Open
554 papers in training set
Top 12%
0.9%
26
Informatics in Medicine Unlocked
21 papers in training set
Top 1%
0.8%
27
American Journal of Respiratory Cell and Molecular Biology
38 papers in training set
Top 0.7%
0.8%
28
BMC Medicine
163 papers in training set
Top 7%
0.7%
29
Clinical Infectious Diseases
231 papers in training set
Top 5%
0.7%
30
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%