Back

Development and validation of an XGBoost model with SHAP-based interpretability and a web-based calculator for predicting extrauterine growth restriction in preterm infants

Xu, Z.; Yu, C.-L.; Zhang, J.-X.

2026-04-02 pediatrics
10.64898/2026.04.01.26349838 medRxiv
Show abstract

Background: Extrauterine growth restriction (EUGR) is a common and clinically significant complication among preterm infants, contributing to adverse neurodevelopmental and metabolic outcomes. Early and individualized risk prediction remains challenging. This study aimed to develop and validate an interpretable machine learning model for early prediction of EUGR using routinely available clinical variables, and to implement a user-friendly web-based calculator for clinical use. Methods: We retrospectively analyzed 1,431 preterm infants admitted within 24 hours after birth to our hospital between May 2020 and March 2025. Infants from the Yangpu campus (n=863) formed the training set, and those from the Huangpu campus (n=568) formed the validation set. Early clinical variables available within 48-72 hours were screened using the Boruta algorithm. Logistic regression, XGBoost, random forest, decision tree, and support vector machine models were developed and compared. Model performance was evaluated using area under the curve (AUC), accuracy, sensitivity, specificity, F1 score, and Brier score. SHapley Additive exPlanations (SHAP) were applied to assess global and individual feature contributions, nonlinear effects, and interactions. A web-based calculator was constructed based on the optimal model. Results: Nine variables were identified as important predictors: birth weight, small for gestational age status, gestational age, breastfeeding, multiple gestation, neonatal respiratory distress syndrome, patent ductus arteriosus, maternal hypertension, and maternal group B Streptococcus infection. Among the five models, XGBoost achieved the best performance in the validation set (AUC 0.922, accuracy 0.849, Brier score 0.108). SHAP analysis showed that low birth weight, small for gestational age, maternal group B Streptococcus infection, and patent ductus arteriosus were major risk factors, while breastfeeding was protective. Notable nonlinear and interactive effects were observed, particularly between birth weight and gestational age and between breastfeeding and patent ductus arteriosus. The web-based calculator provides real-time individualized risk estimation and visualized interpretation. Conclusions: An interpretable XGBoost-based model and web calculator were successfully developed and validated for early prediction of EUGR in preterm infants. This tool may support clinicians in identifying high-risk infants and guiding individualized nutritional and clinical management.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Pediatric Research
18 papers in training set
Top 0.1%
15.0%
2
Frontiers in Pediatrics
29 papers in training set
Top 0.1%
10.7%
3
The Journal of Pediatrics
15 papers in training set
Top 0.1%
9.4%
4
PLOS ONE
4510 papers in training set
Top 20%
9.4%
5
Scientific Reports
3102 papers in training set
Top 12%
7.3%
50% of probability mass above
6
BioData Mining
15 papers in training set
Top 0.1%
3.1%
7
Annals of Translational Medicine
17 papers in training set
Top 0.4%
2.8%
8
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.7%
2.4%
9
Medicine
30 papers in training set
Top 0.7%
2.4%
10
Journal of the American Heart Association
119 papers in training set
Top 2%
2.1%
11
PLOS Digital Health
91 papers in training set
Top 1%
2.1%
12
Healthcare
16 papers in training set
Top 0.4%
1.9%
13
Journal of Clinical Medicine
91 papers in training set
Top 3%
1.8%
14
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 2%
1.7%
15
Pediatric Infectious Disease Journal
16 papers in training set
Top 0.1%
1.7%
16
BMC Pregnancy and Childbirth
20 papers in training set
Top 0.5%
1.4%
17
Frontiers in Public Health
140 papers in training set
Top 6%
1.3%
18
Cureus
67 papers in training set
Top 3%
1.3%
19
PLOS Global Public Health
293 papers in training set
Top 4%
1.3%
20
The Journal of Clinical Endocrinology & Metabolism
35 papers in training set
Top 1%
0.9%
21
Kidney International Reports
14 papers in training set
Top 0.2%
0.9%
22
Investigative Opthalmology & Visual Science
37 papers in training set
Top 0.5%
0.9%
23
Public Health Nutrition
14 papers in training set
Top 0.5%
0.8%
24
Critical Care
14 papers in training set
Top 0.5%
0.8%
25
Physiological Measurement
12 papers in training set
Top 0.4%
0.8%
26
JAMA Network Open
127 papers in training set
Top 5%
0.7%
27
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.5%
28
BMJ Paediatrics Open
21 papers in training set
Top 1.0%
0.5%
29
Stem Cell Research & Therapy
30 papers in training set
Top 1%
0.5%