Development and Evaluation of Machine Learning Models to Predict Mechanical Restraint and Related Coercive Measures in Hospital Psychiatry
Kolding, S.; Damgaard, J. G.; Bernstorff, M.; Hansen, L.; Ostergaard, S. D.; Danielsen, A. A.
Show abstract
IntroductionUse of coercive measures in psychiatric hospitals is clinically and ethically challenging. Aiming to support prevention, we developed and evaluated machine learning models to predict both mechanical restraint and a broader composite outcome that includes related coercive measures. MethodsThe dataset comprised electronic health records (EHR) from adults ([≥]18 years) who had at least one admission to the Psychiatric Services in the Central Denmark Region between 2015 and 2021. For each inpatient day, XGBoost machine learning models were trained to predict mechanical restraint or composite (mechanical, chemical, or manual) restraint within 48 hours. Hyperparameters were optimised for the area under the receiver operating characteristic curve (AUROC) using five-fold cross validation on 85% of the data, with performance validated on a held-out 15% test set. ResultsThe cohort included 16,834 patients with 45,179 inpatient stays, covering 687,388 prediction days. Of these, 2,736 days were followed by a restraint episode within 48 hours, including 983 episodes of mechanical restraint. The final models were trained on 2,389 EHR-based predictors, derived from demographics, diagnoses, medications, and clinical notes. The mechanical restraint model achieved an AUROC of 0.921 (95% CI: [0.918-0.922]) and a positive predictive value (PPV) of 4.9% when classifying the top 1% of risk scores as positive. The composite model achieved an AUROC of 0.912 (95% CI: [0.909-0.913]) and a PPV of 4.2% when predicting mechanical restraint, and 0.900 (95% CI: [0.898-0.900]) with a PPV of 10.4% when predicting composite restraint. ConclusionThe results indicate that incorporating related coercive measures into model training did not improve discrimination (AUROC) for predicting mechanical restraint but did increase PPV when predicting composite restraint, reflecting the higher outcome prevalence. This suggests that leveraging related outcomes can inform prediction of rare events, emphasising the importance of problem framing in clinical prediction modelling. Future work should include external validation across temporal, geographic, and demographic contexts. Significant Outcomes- A machine learning model trained solely for predicting mechanical restraint achieved strong performance (AUROC 0.92), identifying nearly one-third of restraint cases at high specificity. - Training on a broader composite outcome yielded similar discriminatory performance when predicting mechanical restraint, while the higher base rate resulted in a higher positive predictive value for predicting composite restraint. - Broadening the outcome to include multiple restraint types increased the number of at-risk patients detected due to the higher prevalence, without compromising accuracy for mechanical restraint, supporting shared underlying risk factors. Limitations- The model requires more extensive external validation to assess generalisability across time, demographic groups, and settings, which may be limited by regional/national differences in legislation and clinical documentation. - Prediction performance was highest near the restraint event, limiting early forecasting and suggesting that limiting predictions to the early phase of hospitalisation, where most restraint occurs, could elevate the base rate and improve model performance.
Matching journals
The top 11 journals account for 50% of the predicted probability mass.