Back

Predicting long-term adverse outcomes after neonatal intensive care

Ogretir, M.; Kaipainen, V.; Leskinen, M.; Lahdesmaki, H.; Koskinen, M.

2026-03-31 pediatrics
10.64898/2026.03.26.26348580 medRxiv
Show abstract

Neonates requiring intensive care are at increased risk for long-term neuropsychiatric disorders. However, clinical adoption of risk prediction models remains limited when their performance lacks adequate interpretability for informed clinical decision-making. Here, we investigated whether longitudinal neonatal electronic health record (EHR) data from the first 90 days of life can support clinically meaningful interpretation of long-term risk signals for major neuropsychiatric diagnoses by age seven. In a retrospective register-based cohort of 17,655 at-risk children from an academic medical center, of whom 8.0\% (1,420) received a major neuropsychiatric diagnosis during follow-up, we applied a time-aware transformer model (Self-supervised Transformer for Time-Series; STraTS) and thoroughly evaluated its predictions using three complementary interpretability approaches: perturbation-based variable importance, value-dependent effect analysis, and leave-one-out (LOO) feature attribution. STraTS achieved the highest area under the precision--recall curve (AUPRC 0.171 {+/-} 0.022), compared with Random Forest (0.166 {+/-} 0.008), logistic regression (0.151 {+/-} 0.007), and XGBoost (0.128 {+/-} 0.010). Across interpretability methods, five predictors were consistently identified: birth weight, gender, Apgar score at 1 minute, umbilical serum thyroid stimulating hormone (uS-TSH), and treatment time in hospital. Indicators of early clinical severity, including chromosomal abnormalities and neonatal cerebral-status disturbances, showed the largest risk-increasing effects. Furthermore, the model's learned vector representations of subject-specific EHR sequences formed clinically coherent latent embeddings that reflect population heterogeneity along established perinatal risk dimensions. These findings demonstrate that combining multiple complementary interpretability methods yields stable, clinically plausible risk signals while revealing limitations that would remain undetected by any single approach, highlighting the importance of careful interpretability analysis of deep learning-based risk predictions.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.1%
10.5%
2
Nature Medicine
117 papers in training set
Top 0.1%
10.1%
3
npj Digital Medicine
97 papers in training set
Top 0.6%
9.2%
4
Genome Medicine
154 papers in training set
Top 0.6%
8.4%
5
Cell Reports Medicine
140 papers in training set
Top 0.4%
6.4%
6
Scientific Reports
3102 papers in training set
Top 24%
4.9%
7
Science Translational Medicine
111 papers in training set
Top 0.5%
4.3%
50% of probability mass above
8
BioData Mining
15 papers in training set
Top 0.1%
3.6%
9
The Journal of Pediatrics
15 papers in training set
Top 0.2%
3.6%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.7%
11
eBioMedicine
130 papers in training set
Top 0.5%
2.7%
12
Nature Communications
4913 papers in training set
Top 43%
2.7%
13
Annals of Neurology
57 papers in training set
Top 0.7%
2.6%
14
Translational Psychiatry
219 papers in training set
Top 2%
2.1%
15
Pediatric Research
18 papers in training set
Top 0.2%
1.8%
16
NeuroImage: Clinical
132 papers in training set
Top 2%
1.7%
17
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
18
BMC Medicine
163 papers in training set
Top 4%
1.7%
19
Science Bulletin
22 papers in training set
Top 0.4%
1.5%
20
PLOS ONE
4510 papers in training set
Top 66%
0.8%
21
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
22
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.6%
23
JAMA Network Open
127 papers in training set
Top 5%
0.6%