Back

Evaluating Large Language Models for Assessment of Psychosis Risk

Zhu, T.; Tashevski, A.; Taquet, M.; Azis, M.; Jani, T.; Broome, M. R.; Kabir, T.; Minichino, A.; Murray, G. K.; Nour, M. M.; Singh, I.; Fusar-Poli, P.; Nevado-Holgado, A.; McGuire, P.; Oliver, D.

2026-04-04 psychiatry and clinical psychology
10.64898/2026.04.02.26349960 medRxiv
Show abstract

Psychosis prevention relies on early detection of individuals at clinical high risk for psychosis (CHR-P) remains limited, constraining preventive care. The effectiveness of the CHR-P state is constrained, in part due to clinical assessments requiring specialist interpretation of narrative interviews, limiting scalability. Here, we evaluate whether large language models (LLMs; deep learning models trained on large text corpora to process and generate language) can extract clinically meaningful information from such interviews to support psychosis risk assessment. We assessed 11 open-weight LLMs on 678 PSYCHS interview transcripts from 373 participants (77.7% CHR-P). Models inferred CHR-P status and estimated severity and frequency across 15 symptom domains, benchmarked against researcher-rated scores. Larger models achieved the strongest classification performance (Llama-3.3-70B: accuracy = 0.80, sensitivity = 0.93, specificity = 0.58). LLM-generated symptom scores showed good correlations with researcher-rated scores (ICCsev = 0.74, ICCfreq = 0.75). Performance disparities were minimal across most demographic groups but varied across sites. Generated summaries were largely faithful to source transcripts, with low rates of clinically relevant confabulation (3%). Errors primarily reflected over-pathologisation of non-clinical experiences. While accuracy scaled with model size, smaller models achieved competitive performance with substantially lower computational cost. These findings demonstrate that open-weight LLMs can assess psychosis risk from clinical interview transcripts, supporting scalable, human-in-the-loop approaches to early detection.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Medicine
117 papers in training set
Top 0.1%
19.2%
2
Schizophrenia Bulletin
29 papers in training set
Top 0.1%
10.0%
3
JAMA Psychiatry
13 papers in training set
Top 0.1%
8.3%
4
Biological Psychiatry
119 papers in training set
Top 0.7%
4.8%
5
npj Digital Medicine
97 papers in training set
Top 0.9%
4.8%
6
Nature Neuroscience
216 papers in training set
Top 2%
3.9%
50% of probability mass above
7
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
3.6%
8
Neuropsychopharmacology
134 papers in training set
Top 0.9%
3.5%
9
Translational Psychiatry
219 papers in training set
Top 2%
3.5%
10
Nature Communications
4913 papers in training set
Top 44%
2.7%
11
The British Journal of Psychiatry
21 papers in training set
Top 0.3%
2.6%
12
Schizophrenia
19 papers in training set
Top 0.2%
2.6%
13
Molecular Psychiatry
242 papers in training set
Top 2%
1.9%
14
American Journal of Psychiatry
20 papers in training set
Top 0.1%
1.8%
15
Scientific Reports
3102 papers in training set
Top 56%
1.8%
16
Frontiers in Psychiatry
83 papers in training set
Top 2%
1.7%
17
Nature Mental Health
18 papers in training set
Top 0.2%
1.3%
18
NeuroImage: Clinical
132 papers in training set
Top 3%
1.2%
19
eLife
5422 papers in training set
Top 49%
1.2%
20
Communications Medicine
85 papers in training set
Top 0.6%
1.1%
21
European Psychiatry
10 papers in training set
Top 0.5%
0.9%
22
Genome Medicine
154 papers in training set
Top 7%
0.9%
23
BMC Medicine
163 papers in training set
Top 6%
0.8%
24
Biological Psychiatry Global Open Science
54 papers in training set
Top 1%
0.8%
25
Nature
575 papers in training set
Top 16%
0.7%
26
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 2%
0.7%
27
European Child & Adolescent Psychiatry
14 papers in training set
Top 0.4%
0.7%
28
Cell
370 papers in training set
Top 19%
0.6%
29
Brain
154 papers in training set
Top 5%
0.6%
30
Science
429 papers in training set
Top 21%
0.6%