Predicting Near-term Mortality in Heart Failure: External Validation of Electronic Health Record-Based Deep Learning Model
McGilvray, M. M. O.; Pawale, A.; Roberts, S.; Shepherd, H. M.; Wilcox, A.; Heaton, J.; Pasque, M. K.
Show abstract
Structured AbstractO_ST_ABSBackgroundC_ST_ABSThe dire consequences of heart failure (HF) patient non-response to guideline directed medical therapy often fuel early, non-selective referral for surgical intervention (ventricular assist device [VAD] or transplant). The high-risk associated with these interventions mandates precision in directing them only toward those patients who would otherwise suffer severe near-term deterioration. We previously reported a 52,265-patient deep learning model that predicted 1-year severe decompensation/death in HF inpatients, with a C-statistic of 0.91. We now present external model validation. Few groups applying deep learning to large-scale datasets have achieved external validation using equally large-scale independent datasets, yet proof of generalization is essential to practical applicability. MethodsOur previous study used standard electronic health record (EHR) data to build ensemble deep learning models employing time-series and densely connected networks. The positive-class included both all-cause mortality and referral for HF surgical intervention within 1 year. In the current study, we assessed generalization of model architecture in an external validation test set from the Veterans Cardiac Health and Artificial Intelligence Model Predictions (V-CHAMPS) challenge, a synthetic national governmental sample using a distinct EHR system. While V-CHAMPS is a robust dataset, variables that capture VAD/transplant referral were not readily extracted, limiting the positive-class to mortality only. ResultsA total of 380,441 distinct admissions from 75,086 HF patients contributed >720 million EHR datapoints. 23% of observations fit positive-class criteria. The model C-statistic in the external-validation cohort was 0.79. ConclusionsDespite being developed in a single-center dataset with a more precise positive-class, our model architecture maintained relative accuracy when applied to a national sample in an unrelated EHR system. This supports clinical relevancy of the deep-learning model and adaptability with retraining to disparate contexts. This broad applicability suggests considerable potential of EHR-based deep learning models to assist HF clinicians in improving the usage of advanced surgical therapy.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.