Predicting Intentional Self-Harm Following Psychiatric Discharge in Catalonia, Spain: Machine Learning Models from Linked Registry Data
Alayo, I.; Pujol, O.; Amigo, F.; Ballester, L.; Cirici Amell, R.; Contaldo, S. F.; Ferrer, M.; Guinart, D.; Latorre, L.; Leis, A.; Lopez Fernandez, M.; Mayer, M. A.; Pastor, M.; Pena-Salazar, C.; Portillo-Van Diest, A.; Ramirez-Anguita, J. M.; Sanz, F.; Alonso, J.; Kessler, R. C.; Mehlum, L.; Palao, D.; Perez Sola, V.; Vilagut, G.; Mortier, P.
Show abstract
IntroductionPatients recently discharged from psychiatric hospitalization are at increased risk of intentional self-harm, including suicide. Using linked population-based registry data from Catalonia, Spain, we developed machine learning-based prediction models for post-discharge intentional self-harm across different follow-up horizons, sex, and age groups, and evaluated their generalizability and robustness with multiple validation strategies. MethodsRetrospective cohort study including 41,827 individuals accounting for 71,865 psychiatric hospitalizations with discharge at age [≥]10 years, between January 1, 2015, and December 31, 2018, in Catalonia, Spain, with follow-up until December 31, 2019. Primary outcome was intentional self-harm (fatal or non-fatal) within 7, 30, 90, 180, and 365 days post-discharge. Models incorporated 247 predictors from electronic health records, including sociodemographic characteristics, mental and physical disorder categories, categories of dispensed psychotropic medication, and history of self-harm and psychiatric hospitalization. Model performance was evaluated using the area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR). Predictor importance was assessed using Shapley Additive Explanations (SHAP). ResultsWithin 365 days, 4,901 hospitalizations (6.8%) were followed by intentional self-harm. The 365-day model trained on the full cohort achieved a AUCROC of 0.819, in the test sample with adjusted AUCPR indicating a median 5.4-fold improvement over baseline prevalence. This model generalized well across event horizons and sex-age strata, outperforming subgroup-specific models when data sparsity limited performance. Separate models trained by event horizons, and stratified by sex, and sex-age groups achieved a median AUCROC of 0.775 (IQR 0.764-0.808), with adjusted AUCPR indicating a median 5.4-fold improvement over baseline prevalence (IQR 4.5-6.2). Key predictors included the recency of the last registered diagnosis of depressive episodes, recurrent depression, adjustment disorders, and schizophrenia, as well as recent SSRI dispensation and the number of childhood-onset disorder and musculoskeletal disease diagnoses in the previous five years. Predictor importance varied considerably across sex-age strata, with smaller differences across horizons. Subject-level and temporal split validation strategies reduced performance (AUCROC 0.711-0.746), though estimates remained clinically informative (2.8-3.1-fold improvement over baseline prevalence). ConclusionsMachine learning models using routinely collected health records predicted intentional self-harm after psychiatric hospitalization with good discrimination and clinically meaningful precision-recall performance. A single 365-day model generalized well across horizons and demographic groups, suggesting that one broadly trained model may provide a pragmatic and scalable approach for clinical implementation.