An independent supervisory safety agent improves reaction of large language models to suicidal ideation

Trivedi, S.; Simons, N. W.; Tyagi, A.; Ramaswamy, A.; Nadkarni, G. N.; Charney, A. W.

2026-04-15 psychiatry and clinical psychology

10.64898/2026.04.13.26350757 medRxiv

Show abstract

BackgroundLarge language models (LLMs) are increasingly used in mental health contexts, yet their detection of suicidal ideation is inconsistent, raising patient safety concerns. MethodsWe conducted a cross-sectional evaluation using 224 paired suicide-related clinical vignettes presented in a single-turn format under two conditions (with and without structured clinical information). Native LLM safeguard responses were compared with an independent supervisory safety architecture with asynchronous monitoring. The primary outcome was detection of suicide risk requiring intervention. ResultsThe supervisory system detected suicide risk in 205 of 224 evaluations (91.5%) versus 41 of 224 (18.3%) for native LLM safeguards. Among 168 discordant evaluations, 166 favored the supervisory system and 2 favored the LLM (matched odds ratio {approx}83.0). Both systems detected risk in 39 evaluations, and neither in 17. Detection was highest in scenarios with explicit suicidal ideation and lower in more ambiguous presentations. ConclusionsNative LLM safeguards frequently failed to detect suicide risk in this structured evaluation. An independent monitoring approach substantially improved detection, supporting the role of external safety systems in high-risk mental health applications of LLMs.

An independent supervisory safety agent improves reaction of large language models to suicidal ideation

Matching journals