Beyond AI Psychosis and Sycophancy: Structural Drift as a System-Level Safety Failure
Kim, J. E.; Holbrook, E. B.; Hron, J. D.; Parsons, C. R.
Show abstract
BackgroundConversational AI safety systems are primarily evaluated using message-level content monitoring, which assesses inputs and outputs in isolation. This message-by-message approach can miss interaction-level risks that emerge over extended conversations, including patterns discussed in reports of "AI psychosis." Critically, by the time users express overt psychosis-spectrum content, opportunities for intervention may be limited. ObjectiveWe investigated whether LLM responses gradually expand and connect interpretations beyond the users original concerns, a process we term structural drift. We also tested whether this drift can be detected early and automatically. MethodsWe developed an automated, LLM-adapted rubric-based prompt for seven domains of anomalous (psychosis-spectrum) experience, derived from phenomenological psychiatry to capture subtle shifts in subjective interpretation. In Part 1, we evaluated the rubric using gold-standard text excerpts (N = 484) adapted from clinically validated qualitative instruments. In Part 2, we analyzed 1,290 user-LLM response exchanges from 7 dialogues, using 3 different LLMs (5 repeats each), to measure (i) domain amplification (increasing score within a domain) and (ii) domain expansion (new domains appearing over time). ResultsAutomated scoring showed strong agreement with gold-standard excerpts (domain accuracy 82.7-98.9%; exact 0-3 agreement 63.6-82.7%). Across dialogues, we observed significant amplification in four domains (p < .05; d = 0.14-0.46) and domain expansion in 83.8% of dialogues (88/105; p < .001). ConclusionsAI responses can systematically expand and intensify users descriptions beyond their initial input. Taken together with the predictive-processing accounts of psychosis, the exposure itself may reinforce maladaptive inferences. Because drift is detectable from ordinary dialogue without clinical-style probing, this structural drift detection may support scalable, real-time monitoring for emerging risks before overt escalation.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.