Back

Machine Learning Estimation of Gestational Age at Delivery Using Linked Mother-Infant Electronic Health Records Across Two Health Systems

Bejan, C. A.; Yang, X.; Pham, A.; Qassem, L.; Abraham, A. A.; Choi, L.; Rosenbloom, S. T.; Gamire, L. X.; Phillips, E. J.

2026-05-25 obstetrics and gynecology
10.64898/2026.05.23.26353959 medRxiv
Show abstract

Objective This study aimed to train and evaluate supervised machine learning algorithms using electronic health record (EHR) data to accurately estimate gestational age at delivery. <br>Materials and Methods We trained random forest, gradient boosting, and ensemble models on EHR data of mother-infant dyads from Vanderbilt University Medical Center(VUMC) and replicated the analyses at University of Michigan (UMich). We further analyzed EHR predictors of gestational age, assessed temporal drift in EHR data elements, and evaluated model performance stratified by delivery status. <br>Results The study included pregnancies corresponding to 54,344 and 34,345 mother-infant dyads at VUMC (2005-2025) and UMich (2012-2024), respectively. The gestational age predictions of the ensemble models achieved the highest agreement with the reference standard on the VUMC dataset ({+/-}1 week: 85.2%, {+/-}2 weeks: 94.3%, MAE: 4.4 days) and demonstrated stronger generalization on the UMich dataset ({+/-}1 week: 93.1%, {+/-}2 weeks: 97.8%, MAE: 2.8 days). Further, performance was better among pregnancies delivered in more recent years, and among full- and late-term deliveries compared with preterm deliveries. <br>Discussion The results indicate that supervised machine learning methods leveraging linked mother-infant EHRs can accurately estimate gestational age at delivery, while demonstrating the generalizability of the modeling approach and the portability of the analytic workflow across healthcare sites. <br>Conclusion This study presents a robust and generalizable machine learning framework to estimate gestational age at delivery. The framework can be reliably used to impute gestational age in large-scale, real-world clinical studies to support maternal and neonatal health research, in which accurate estimation of pregnancy onset is critical.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
32.9%
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
10.4%
3
JAMIA Open
37 papers in training set
Top 0.1%
10.1%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 27%
6.4%
5
BMC Pregnancy and Childbirth
20 papers in training set
Top 0.3%
3.6%
6
Scientific Reports
3102 papers in training set
Top 37%
3.6%
7
BMJ Open
554 papers in training set
Top 6%
3.1%
8
Healthcare
16 papers in training set
Top 0.2%
2.9%
9
BMC Medical Education
20 papers in training set
Top 0.5%
1.9%
10
BMJ Open Quality
15 papers in training set
Top 0.4%
1.8%
11
BMC Medical Research Methodology
43 papers in training set
Top 0.6%
1.7%
12
JAMA Network Open
127 papers in training set
Top 3%
1.5%
13
Frontiers in Public Health
140 papers in training set
Top 6%
1.3%
14
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.2%
15
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.1%
16
PLOS Global Public Health
293 papers in training set
Top 5%
0.9%
17
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
18
Heliyon
146 papers in training set
Top 5%
0.9%
19
JMIR Formative Research
32 papers in training set
Top 1%
0.9%
20
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
21
npj Digital Medicine
97 papers in training set
Top 3%
0.9%
22
BMC Health Services Research
42 papers in training set
Top 2%
0.8%
23
DIGITAL HEALTH
12 papers in training set
Top 0.6%
0.8%
24
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
25
The Journal of Clinical Endocrinology & Metabolism
35 papers in training set
Top 1%
0.8%
26
JMIR Research Protocols
18 papers in training set
Top 2%
0.7%
27
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
28
The Lancet Digital Health
25 papers in training set
Top 1%
0.6%