Back

Beyond AI Psychosis and Sycophancy: Structural Drift as a System-Level Safety Failure

Kim, J. E.; Holbrook, E. B.; Hron, J. D.; Parsons, C. R.

2026-03-19 health informatics
10.64898/2026.03.19.26346371 medRxiv
Show abstract

BackgroundConversational AI safety systems are primarily evaluated using message-level content monitoring, which assesses inputs and outputs in isolation. This message-by-message approach can miss interaction-level risks that emerge over extended conversations, including patterns discussed in reports of "AI psychosis." Critically, by the time users express overt psychosis-spectrum content, opportunities for intervention may be limited. ObjectiveWe investigated whether LLM responses gradually expand and connect interpretations beyond the users original concerns, a process we term structural drift. We also tested whether this drift can be detected early and automatically. MethodsWe developed an automated, LLM-adapted rubric-based prompt for seven domains of anomalous (psychosis-spectrum) experience, derived from phenomenological psychiatry to capture subtle shifts in subjective interpretation. In Part 1, we evaluated the rubric using gold-standard text excerpts (N = 484) adapted from clinically validated qualitative instruments. In Part 2, we analyzed 1,290 user-LLM response exchanges from 7 dialogues, using 3 different LLMs (5 repeats each), to measure (i) domain amplification (increasing score within a domain) and (ii) domain expansion (new domains appearing over time). ResultsAutomated scoring showed strong agreement with gold-standard excerpts (domain accuracy 82.7-98.9%; exact 0-3 agreement 63.6-82.7%). Across dialogues, we observed significant amplification in four domains (p < .05; d = 0.14-0.46) and domain expansion in 83.8% of dialogues (88/105; p < .001). ConclusionsAI responses can systematically expand and intensify users descriptions beyond their initial input. Taken together with the predictive-processing accounts of psychosis, the exposure itself may reinforce maladaptive inferences. Because drift is detectable from ordinary dialogue without clinical-style probing, this structural drift detection may support scalable, real-time monitoring for emerging risks before overt escalation.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Medical Internet Research
85 papers in training set
Top 0.3%
12.3%
2
npj Digital Medicine
97 papers in training set
Top 0.6%
8.4%
3
Frontiers in Digital Health
20 papers in training set
Top 0.1%
7.1%
4
JMIR Formative Research
32 papers in training set
Top 0.2%
6.3%
5
BJPsych Open
25 papers in training set
Top 0.1%
3.9%
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.6%
7
Frontiers in Psychiatry
83 papers in training set
Top 1%
3.2%
8
PLOS ONE
4510 papers in training set
Top 42%
3.1%
9
Scientific Reports
3102 papers in training set
Top 42%
3.1%
50% of probability mass above
10
Journal of General Internal Medicine
20 papers in training set
Top 0.3%
2.6%
11
Acta Neuropsychiatrica
12 papers in training set
Top 0.2%
2.6%
12
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 3%
1.9%
13
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.2%
1.7%
14
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
15
JAMA Pediatrics
10 papers in training set
Top 0.1%
1.7%
16
BMJ Open
554 papers in training set
Top 9%
1.7%
17
Journal of Biomedical Informatics
45 papers in training set
Top 0.8%
1.7%
18
JAMA Network Open
127 papers in training set
Top 3%
1.2%
19
Biological Psychiatry
119 papers in training set
Top 2%
0.9%
20
JAMIA Open
37 papers in training set
Top 1%
0.9%
21
International Journal of Drug Policy
11 papers in training set
Top 0.3%
0.9%
22
Psychiatry Research
35 papers in training set
Top 1%
0.9%
23
JMIR Research Protocols
18 papers in training set
Top 1%
0.9%
24
JMIRx Med
31 papers in training set
Top 1%
0.9%
25
Bioinformatics
1061 papers in training set
Top 9%
0.9%
26
Annals of Internal Medicine
27 papers in training set
Top 0.8%
0.9%
27
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
28
Translational Psychiatry
219 papers in training set
Top 4%
0.7%
29
Nature Medicine
117 papers in training set
Top 5%
0.7%
30
European Psychiatry
10 papers in training set
Top 0.7%
0.7%