Back

A Heterogeneous Graph Neural Network Framework for Multi-Horizon Stroke Mortality Prediction

Tharzeen, A.; Vafaei Sadr, A.; Radfar, N.; Hwang, W.; Abedi, V.; Zand, R.

2026-06-10 health informatics
10.64898/2026.06.09.26355176 medRxiv
Show abstract

Background: Machine learning models for stroke mortality prediction typically treat each time horizon independently and use flat tabular features that ignore the relational structure of electronic health records (EHRs). In this pilot study, we leveraged graph-based machine learning models to predict post stroke all-cause-mortality across three different time horizons. Methods: We developed Stroke Temporal Heterogeneous Graph (StrokeTHG), a heterogeneous graph neural network model for simultaneous multi-horizon stroke mortality prediction (30-day, 90-day, 1-year) using EHR data from Penn State Health System. The model encodes various relations among EHR entities (e.g., patient, diagnosis, comorbidity) and temporal encoding of admission time to better predict stroke mortality. We compared our proposed approach against various baseline methods, including Logistic Regression, Random Forest, and XGBoost. We also performed ablation and subgroup analyses, evaluated the quality of learned graph embeddings, and assessed the importance of different edge types in the graph. Results: We included 4,144 stroke patients (mean age 69.2 years; 54.3% men), of whom 3,332 (80.4%) survived their stroke after one year. 30-day, 90-day, and 1-year mortality rates were 9.7%, 13.7%, and 19.6%, respectively. Our proposed approach, StrokeTHG, achieved AUROC of 0.872, 0.878, and 0.837 across horizons, outperforming all tabular baselines. At [≥] , 75% specificity, the model identified 5-10 percentage points more mortality cases than the best baseline at each horizon. Subgroup analysis demonstrated consistent performance across sex subgroups and the largest discriminative gains in the Age 65-80 stratum. Edge-type ablation identified phenotype-patient and admission-patient edges in the constructed EHR graph as the most influential relational edges for mortality prediction. StrokeTHG embeddings outperformed all graph and matrix factorization baselines under an identical downstream classifier, confirming that performance gains stem from representation quality rather than classifier capacity. Conclusions: StrokeTHG demonstrates that heterogeneous graph representations of EHR data provide a consistent improvement over flat tabular models for multi-horizon stroke mortality prediction, with particular advantage at clinically actionable sensitivity thresholds and novel multi-horizon monotonic prediction capability. This methodological framework may be adaptable to other EHR-based clinical research studies seeking to leverage heterogeneous relational structures for predictive modeling.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
28.7%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
23.4%
50% of probability mass above
3
npj Digital Medicine
97 papers in training set
Top 0.5%
10.8%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.2%
10.5%
5
JMIR Medical Informatics
17 papers in training set
Top 0.4%
2.8%
6
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.7%
7
JAMIA Open
37 papers in training set
Top 0.6%
2.4%
8
Scientific Reports
3102 papers in training set
Top 48%
2.2%
9
International Journal of Medical Informatics
25 papers in training set
Top 0.8%
1.8%
10
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.4%
11
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.0%
12
Journal of the American Heart Association
119 papers in training set
Top 4%
0.9%
13
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.9%
14
The Lancet Digital Health
25 papers in training set
Top 0.8%
0.9%
15
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.9%
0.8%
16
BioData Mining
15 papers in training set
Top 0.8%
0.8%
17
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.8%
18
Cureus
67 papers in training set
Top 5%
0.7%
19
Schizophrenia
19 papers in training set
Top 0.4%
0.7%
20
PLOS ONE
4510 papers in training set
Top 72%
0.5%
21
Patterns
70 papers in training set
Top 3%
0.5%
22
JCO Clinical Cancer Informatics
18 papers in training set
Top 1%
0.5%