Back

Development and Validation of Machine Learning-Based Prediction of Depression Progression Using EHR Data: A Multi-Institutional Retrospective Cohort Study

Ahadian, P.; Fragano, A.; Guan, T.; Guan, Q.; Shalhout, S. Z.

2025-12-01 psychiatry and clinical psychology
10.1101/2025.11.28.25341207 medRxiv
Show abstract

BackgroundDepression is a leading cause of global disability. Timely identification of patients at risk for clinical worsening remains a major challenge. Electronic health records (EHRs) facilitate large-scale, real-world analyses of disease trajectories. However, standardized symptom scale data such as the Patient Health Questionnaire-9 are often unavailable or recorded only as unstructured text. In this context, International Classification of Diseases (ICD10) diagnostic-code based severity progression provides a pragmatic alternative for developing predictive tools to identify worsening depression. ObjectiveWe aim to develop and evaluate machine-learning and deep-learning models for predicting ICD10-defined progression from mild to moderate/severe depression using EHR data curated by the MedStar Health Research Institute (MHRI). MethodsWe conducted a multi-institutional retrospective cohort analysis using the MHRI EHR database, which integrates data from 10 hospitals and 300 outpatient sites across the mid-Atlantic. Adults ([≥]18 years) with an initial ICD10 diagnosis of mild depression between 2017 and 2023 were included (N=2131). Nonprogressors were defined as patients whose mild major depressive disorder remained mild for 24 months (N=270). Progressors were defined as patients who developed moderate or severe ICD10 depression within 24 months of the index diagnosis (N=533). Data were stratified and split into (60%) training, (20%) validation, and (20%) test subsets. A heterogeneous feature set spanning demographics, healthcare utilization, socioeconomic indices, diagnostic context, and laboratory measurements were available. Logistic regression utilized elastic net regularization with fivefold cross validation, and random forest hyperparameters were tuned by grid search. XGBoost, CatBoost, and a deep neural network (DNN) were trained with standard learning rate, depth, class weighting, and early stopping. A deterministic top model selection framework applied prespecified thresholds of sensitivity at least 0.70 and AUC at least 0.70, and composite rankings integrated accuracy, sensitivity, specificity, and the overfitting gap. ResultsThe analytic cohort included 803 patients with complete two-year follow-up. Under the selection criteria, the DNN failed to meet the AUC threshold (0.671) and was excluded. Among the remaining models, XGBoost achieved the top composite score (accuracy = 0.72; AUC = 0.776; sensitivity = 0.77; specificity = 0.63; overfit gap = 0.112). Logistic regression ranked second (accuracy = 0.71; AUC = 0.797; sensitivity = 0.79; specificity = 0.61; overfit gap = 0.052), followed by CatBoost and random forest, the latter penalized for overfitting (gap = 0.278). The TinyLlama audit note, generated through a local Hugging Face pipeline, confirmed XGBoost as the most balanced model. ConclusionsUsing EHR data from a multi-institutional regional health system, we developed and validated machine-learning models that predicted progression of depression. XGBoost demonstrated the most reliable composite performance. These findings support the feasibility of leveraging socioeconomic and EHR data to predict worsening depression and emphasize the importance of transparent model-selection frameworks for trustworthy clinical artificial intelligence.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
18.2%
2
Journal of Affective Disorders
81 papers in training set
Top 0.2%
14.0%
3
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
9.9%
4
European Psychiatry
10 papers in training set
Top 0.1%
6.2%
5
Journal of Medical Internet Research
85 papers in training set
Top 1.0%
4.7%
50% of probability mass above
6
JAMA Network Open
127 papers in training set
Top 0.6%
4.7%
7
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.2%
8
BJPsych Open
25 papers in training set
Top 0.2%
2.5%
9
PLOS ONE
4510 papers in training set
Top 45%
2.5%
10
Frontiers in Psychiatry
83 papers in training set
Top 1%
2.5%
11
Translational Psychiatry
219 papers in training set
Top 2%
2.4%
12
Acta Neuropsychiatrica
12 papers in training set
Top 0.4%
1.8%
13
BMJ Mental Health
15 papers in training set
Top 0.2%
1.7%
14
Frontiers in Digital Health
20 papers in training set
Top 0.8%
1.5%
15
BMC Medicine
163 papers in training set
Top 4%
1.5%
16
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
17
PLOS Medicine
98 papers in training set
Top 4%
0.9%
18
JMIR Formative Research
32 papers in training set
Top 1%
0.9%
19
Biological Psychiatry
119 papers in training set
Top 2%
0.9%
20
BMJ Open
554 papers in training set
Top 12%
0.8%
21
Nature Medicine
117 papers in training set
Top 5%
0.8%
22
American Journal of Psychiatry
20 papers in training set
Top 0.5%
0.7%
23
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
24
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
25
Scientific Reports
3102 papers in training set
Top 75%
0.7%
26
Frontiers in Artificial Intelligence
18 papers in training set
Top 1.0%
0.6%
27
Psychological Medicine
74 papers in training set
Top 2%
0.6%
28
JAMA Psychiatry
13 papers in training set
Top 0.7%
0.6%
29
Biological Psychiatry Global Open Science
54 papers in training set
Top 2%
0.6%