Random Forest Model for Predicting Post-Lockdown Antenatal Depression Risk: A Cross-Sectional Study of Pregnant Women in China
Pan, Y.; Lin, H.; HIRONO, T.; Yang, Y.; Liu, Y.; Zhang, Y.
Show abstract
Background As lockdown measures was eased, pregnant women faced an elevated risk of COVID-19 infection, potentially impacting their mental health. This study aimed to investigate the prevalence of antenatal depression (AD) post-lockdown and develop predictive models for AD risk using machine learning. Methods A cross-sectional study utilizing the Edinburgh Postnatal Depression Scale was conducted in Beijing and Guizhou, China, from January to August 2023. Data was randomly split into training and test datasets (6:4 ratio), with logistic regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Gradient Boosting Decision Tree (GBDT) models trained and compared. The best model underwent further examination, including SHapley Additive exPlanations (SHAP) for feature importance, calibration curve (CC) for discrimination, and decision curve analysis (DCA) for clinical benefit. Results The effective response rate was 91.07% (459/504), with 25.7% (118/459) testing positive for AD. Multivariate analysis identified "sleep disorders," "family support level," and "COVID-19 symptom severity" as independent predictors. RF model showed the highest area under the curve in both training (0.842) and testing (0.724) datasets, with SHAP emphasizing the greatest impact of "sleep disorders" on AD. The RF model's calibration (P > 0.05) and clinical utility across thresholds (8%-95% and 10%-58%) were confirmed by CC and DCA, respectively. Conclusions AD strongly correlated with "sleep disorders," "family support level," and "COVID-19 symptom severity" post-lockdown, and the EPDS-based RF model effectively predicted AD risk.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.