ESUS-AI:a machine learning framework to estimate the most likely embolic source in embolic stroke of undetermined source
Bonura, A.; Juega, J.; Meza, C.; Kühne Escola, J.; Muchada, M.; Rubiera, M.; Olive Gadea, M.; Requena, M.; Rodrigo-Gisbert, M.; Rodriguez-Villatoro, N.; Rodriguez-Luna, D.; Rizzo, F.; Fiore, G. M.; Simonetti, R.; Brunelli, N.; Fernandez-Galera, R.; Francisco Pascal, J.; Colangelo, G.; Ribo, M.; Molina, C. A.; PAGOLA, J.
Show abstract
Background and PurposeEmbolic stroke of undetermined source (ESUS) emains a major diagnostic challenge in vascular neurology, as a substantial proportion of patients lack an identifiable embolic source despite standardized diagnostic workup. The failure of empiric anticoagulation strategies highlights the need for individualized, mechanism-oriented risk stratification. We aimed to develop a machine learning-based framework to estimate the most likely embolic source in ESUS using routinely available clinical data. MethodsWe retrospectively analyzed consecutive ESUS patients admitted to the Stroke Unit of Vall dHebron Hospital between 2020 and 2024. Three supervised machine learning models (XGBoost, Random Forest, and regularized logistic regression) were trained to independently predict the presence of left atrial enlargement (LAE), left ventricular dysfunction or akinesia (LVD), and complex aortic plaques (AP), based on demographic, clinical, laboratory, and imaging variables available at diagnosis. Model interpretability was assessed using permutation importance and SHAP analyses. ResultsAmong 1,741 ESUS patients (mean age 71.5{+/-}14.6 years; 48.3% women), LAE was present in 40.5%, AP in 11.0%, and LVD in 6.5%. XGBoost achieved the best overall performance across targets (PR-AUC: 0.71 for LAE, 0.29 for AP, 0.44 for LVD). Distinct and biologically coherent risk profiles emerged. LAE was driven by older age, elevated NT-proBNP, higher stroke severity, and a non-linear association with cholesterol. AP was associated with advanced age and traditional vascular risk factors. LVD showed a cardiomyopathic pattern characterized by elevated NT-proBNP, younger age, male sex, and severe strokes. ConclusionsA machine learning-based approach can provide probabilistic, mechanism-oriented stratification in ESUS, capturing non-linear interactions among routinely available variables. This framework may support clinicians in prioritizing targeted diagnostic pathways and tailoring secondary prevention strategies, pending external validation.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.