Condition-Specific Readmission Risk Stratification in a Predominantly Black Statewide Cohort Using Machine Learning: Development of Subtype-Specific Models for Heart Failure, Acute Myocardial Infarction, Atrial Fibrillation/Flutter, and Hypertensive Heart Disease
EL Moudden, I.; Bittner, M.; Dodani, S.
Show abstract
BackgroundCardiovascular disease (CVD) readmissions impose substantial clinical and economic burden. Machine learning (ML) may improve risk stratification, yet most predictive models aggregate CVD subtypes into a single outcome and underrepresent Black populations. MethodsUsing Virginia Health Information database records (2010 to 2020), we analyzed 157,791 discharge records from 123,272 unique patients (96.6% Black) to develop condition-specific 30-day readmission models for heart failure (HF; n=91,752), acute myocardial infarction (AMI; n=34,497), atrial fibrillation/flutter (AF/AFL; n=18,424), and hypertensive heart disease (HHD; n=13,118). Four algorithms (XGBoost, LightGBM, Random Forest, Elastic Net) plus a Super Learner ensemble were trained on patient-grouped 70/30 splits with and without Synthetic Minority Oversampling Technique balancing. Models incorporated validated clinical indices (LACE, Charlson, Elixhauser) and administrative social determinants of health proxies. ResultsThe overall 30-day readmission rate was 18.9%. Best area under the receiver operating characteristic curve (AUC) values by condition were HF 0.708 (95% CI, 0.701 to 0.716), AMI 0.706 (95% CI, 0.691 to 0.721), AF/AFL 0.732 (95% CI, 0.715 to 0.750), and HHD 0.758 (95% CI, 0.735 to 0.777). XGBoost was the top-performing algorithm for three of four subtypes. The LACE Index, Charlson Comorbidity Index, and insurance type were consistently the strongest predictors. Algorithm-native, aggregated, and SHAP-based importance measures converged on these key features. ConclusionsIn this largest-to-date, predominantly Black statewide cohort, condition-specific ML models achieved moderate-to-high discrimination for HF, AMI, AF/AFL, and HHD. Key clinical indices and administrative social determinants proxies emerged as dominant predictors, highlighting modifiable targets and high-risk subgroups. These findings support the development of precision, equity-informed readmission interventions and provide a scalable framework for deploying ML-driven decision support in safety-net and minority-serving healthcare systems. WHAT IS KNOWN* Machine learning models for cardiovascular readmission prediction have largely aggregated disease subtypes and underrepresented Black populations. * Most existing studies lack head-to-head algorithm comparisons within racially concentrated cohorts and omit social determinants of health proxies. WHAT THE STUDY ADDS* Condition-specific models for four cardiovascular subtypes achieved moderate-to-high discrimination (AUC 0.690 to 0.706) in the largest machine learning-based analysis of a predominantly Black statewide cohort. * Validated clinical indices (LACE, Charlson) and insurance type consistently emerged as dominant predictors, identifying modifiable targets for equity-informed intervention. * The scalable, administrative-data-only framework supports deployment of subtype-specific readmission decision support in safety-net and minority-serving health systems.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.