Heart Failure Prediction & Risk Stratification using Machine Learning
Ali, S.; Leavitt, M. A.; Asghar, W.
Show abstract
Heart failure (HF) is one of the most prevalent causes of morbidity, mortality, and healthcare expenditures, with approximately 6.7 million adults in the U.S. suffering from this condition and contributing to hundreds of thousands of deaths annually. Early diagnosis of high-risk individuals has been a challenge, as the HF-specific symptoms are often ignored or misinterpreted as normal aging, stress, or minor illnesses, leading to delayed diagnosis. We trained, tested, and evaluated several models, including logistic regression, SVM, KNN, random forest, XGBoost, MLP, and a custom stacked ensemble using stratified 5-fold CV and 70/30 hold-out splits for HF prediction on routinely available electronic medical record (EMR) data of the All of Us Research Program. This group consisted of 37,070 adults (13,577 HF; 23,493 non-HF). The predictors included readily available demographics, vital signs, conditions, and laboratory results. Preprocessing steps included IQR-winsorization, median imputation, one-hot encoding, and QuantileTransformer. The stacked model obtained ROC-AUC 0.927, PR-AUC 0.895, and accuracy 0.856 in the test set. To support real-world deployment, we calibrated predicted probabilities and adjusted them to a realistic population prevalence, yielding interpretable probability estimates and clear stratification of individuals into clinically actionable risk tiers. SHAP analysis identified the most influential features, namely, atrial fibrillation, age, hypertensive disorder, sodium, and deprivation index, as the top 5 features impacting the model?s prediction. A secondary multiclass experiment (No-HF, HF with reduced ejection fraction, and HF with preserved ejection fraction) was performed, achieving lower discrimination results (macro-AUC ~0.87) and a lower per-class precision/recall, presumably due to label noise, class imbalance, and overlapping phenotypes. We have demonstrated that a carefully calibrated stacked ensemble on the combination of readily available EMR variables can achieve strong discrimination on HF, making it an effective tool for an AI clinical decision support system (AI-CDSS) in population screening and proactive care pathways.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.