Back

Machine Learning Estimation of Gestational Age at Delivery Using Linked Mother-Infant Electronic Health Records Across Two Health Systems

Bejan, C. A.; Yang, X.; Pham, A.; Qassem, L.; Abraham, A. A.; Choi, L.; Rosenbloom, S. T.; Gamire, L. X.; Phillips, E. J.

2026-05-25 obstetrics and gynecology

10.64898/2026.05.23.26353959 medRxiv

Show abstract

Objective This study aimed to train and evaluate supervised machine learning algorithms using electronic health record (EHR) data to accurately estimate gestational age at delivery. <br>Materials and Methods We trained random forest, gradient boosting, and ensemble models on EHR data of mother-infant dyads from Vanderbilt University Medical Center(VUMC) and replicated the analyses at University of Michigan (UMich). We further analyzed EHR predictors of gestational age, assessed temporal drift in EHR data elements, and evaluated model performance stratified by delivery status. <br>Results The study included pregnancies corresponding to 54,344 and 34,345 mother-infant dyads at VUMC (2005-2025) and UMich (2012-2024), respectively. The gestational age predictions of the ensemble models achieved the highest agreement with the reference standard on the VUMC dataset ({+/-}1 week: 85.2%, {+/-}2 weeks: 94.3%, MAE: 4.4 days) and demonstrated stronger generalization on the UMich dataset ({+/-}1 week: 93.1%, {+/-}2 weeks: 97.8%, MAE: 2.8 days). Further, performance was better among pregnancies delivered in more recent years, and among full- and late-term deliveries compared with preterm deliveries. <br>Discussion The results indicate that supervised machine learning methods leveraging linked mother-infant EHRs can accurately estimate gestational age at delivery, while demonstrating the generalizability of the modeling approach and the portability of the analytic workflow across healthcare sites. <br>Conclusion This study presents a robust and generalizable machine learning framework to estimate gestational age at delivery. The framework can be reliably used to impute gestational age in large-scale, real-world clinical studies to support maternal and neonatal health research, in which accurate estimation of pregnancy onset is critical.

Machine Learning Estimation of Gestational Age at Delivery Using Linked Mother-Infant Electronic Health Records Across Two Health Systems

Matching journals