Back

Predicting Depressive Symptoms Among Reproductive-Aged Women in Bangladesh Using Bagging Ensemble Machine Learning on Imbalanced Bangladesh Demographic and Health Survey 2022 Data

Mahmud, S.; Akter, M. S.; Ahamed, B.; Rahman, A. E.; El Arifeen, S.; Hossain, A. T.

2026-04-23 public and global health
10.64898/2026.04.22.26351445 medRxiv
Show abstract

Background Depressive symptoms among reproductive-aged women represent a major public health concern in low- and middle-income countries, yet systematic screening remains limited. In most population survey datasets, the low prevalence of depression results in severe class imbalance, which challenges conventional machine learning models. Therefore, we develop and evaluate a bagging-based ensemble machine learning framework to predict depressive symptoms among reproductive-aged women using highly imbalanced Bangladesh demographic and health survey (BDHS) 2022 data. Methods The sample comprised women aged 15-49 years drawn from BDHS 2022 data. Depressive symptoms were defined using the Patient Health Questionnaire (PHQ-9 [≥]10). Candidate predictors were drawn from sociodemographic, reproductive, nutritional, psychosocial, healthcare access, and environmental domains. Feature selection was performed using Elastic Net (EN), Random Forest (RF), and XGBoost model. Five classifiers (EN, RF, Support Vector Machine (SVM), K-nearest neighbors (KNN), and Gradient Boosting Machine (GBM)) were trained using both oversampling-based approaches and the proposed ensemble framework. Model performance was evaluated on an independent test set using accuracy, sensitivity, specificity, F1-score, and the normalized Matthews correlation coefficient (normMCC). Results Approximately 4.8% of women were identified with depressive symptoms. The proposed bagging ensemble framework consistently achieved more balanced predictive performance than oversampling-based models. Average normMCC improved from 0.540 (oversampling) to 0.557 (ensemble). RF and GBM ensembles demonstrated notable improvements in identifying depressive cases, while the EN ensemble achieved the highest overall performance and sensitivity. Threshold optimization yielded stable normMCC across models, indicating robust trade-offs between sensitivity and specificity. Conclusions Bagging-based ensemble learning provides a more robust and balanced approach than synthetic oversampling for predicting depressive symptoms in highly imbalanced population survey data. This approach has important implications for improving early identification and population-level mental health surveillance in resource-constrained settings.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 7%
22.2%
2
Journal of Medical Internet Research
85 papers in training set
Top 0.8%
6.2%
3
Journal of Affective Disorders
81 papers in training set
Top 0.5%
4.1%
4
BMC Medical Research Methodology
43 papers in training set
Top 0.2%
3.9%
5
Scientific Reports
3102 papers in training set
Top 32%
3.9%
6
Frontiers in Public Health
140 papers in training set
Top 2%
3.9%
7
JMIR Public Health and Surveillance
45 papers in training set
Top 0.7%
3.5%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.9%
3.5%
50% of probability mass above
9
Frontiers in Psychiatry
83 papers in training set
Top 1%
3.5%
10
BMC Public Health
147 papers in training set
Top 2%
3.5%
11
Wellcome Open Research
57 papers in training set
Top 0.5%
2.3%
12
BMC Medicine
163 papers in training set
Top 3%
2.0%
13
Journal of Public Health
23 papers in training set
Top 0.2%
2.0%
14
PLOS Global Public Health
293 papers in training set
Top 3%
1.9%
15
International Journal of Epidemiology
74 papers in training set
Top 1%
1.9%
16
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.7%
17
BMJ Open
554 papers in training set
Top 9%
1.7%
18
PeerJ
261 papers in training set
Top 8%
1.6%
19
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.5%
20
American Journal of Epidemiology
57 papers in training set
Top 0.8%
1.5%
21
BMC Research Notes
29 papers in training set
Top 0.2%
1.3%
22
BMC Infectious Diseases
118 papers in training set
Top 4%
1.3%
23
International Journal of Environmental Research and Public Health
124 papers in training set
Top 5%
1.2%
24
Disaster Medicine and Public Health Preparedness
16 papers in training set
Top 1%
1.2%
25
JMIR Research Protocols
18 papers in training set
Top 1%
0.9%
26
Journal of Global Health
18 papers in training set
Top 0.6%
0.7%
27
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
28
Social Science & Medicine
15 papers in training set
Top 1.0%
0.7%
29
JMIR Formative Research
32 papers in training set
Top 2%
0.7%
30
JMIRx Med
31 papers in training set
Top 2%
0.7%