Back

Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records

Frydman-Gani, C.; Arias, A.; Perez Vallejo, M.; Londono Martinez, J. D.; Valencia-Echeverry, J.; Castano, M.; Bui, A. A. T.; Freimer, N. B.; Lopez-Jaramillo, C.; Olde Loohuis, L. M.

2025-08-12 psychiatry and clinical psychology
10.1101/2025.08.07.25333172 medRxiv
Show abstract

The accurate detection of clinical phenotypes from electronic health records (EHRs) is pivotal for advancing large-scale genetic and longitudinal studies in psychiatry. Free-text clinical notes are an essential source of symptom-level information, particularly in psychiatry. However, the automated extraction of symptoms from clinical text remains challenging. Here, we tested 11 open-source generative large language models (LLMs) for their ability to detect 109 psychiatric phenotypes from clinical text, using annotated EHR notes from a psychiatric clinic in Colombia. The LLMs were evaluated both "out-of-the-box" and after fine-tuning, and compared against a traditional natural language processing (tNLP) method developed from the same data. We show that while base LLM performance was poor to moderate (0.2-0.6 macro-F1 for zero-shot; 0.2-0.74 macro-F1 for few shot), it improved significantly after fine-tuning (0.75-0.86 macro-F1), with several fine-tuned LLMs outperforming the tNLP method. In total, 100 phenotypes could be reliably detected (F1>0.8) using either a fine-tuned LLM or tNLP. To generate a fine-tuned LLM that can be shared with the scientific and medical community, we created a fully synthetic dataset free of patient information but based on original annotations. We fine-tuned a top-performing LLM on this data, creating "Mistral-small-psych", an LLM that can detect psychiatric phenotypes from Spanish text with performance comparable to that of LLMs trained on real EHR data (macro-F1=0.79). Finally, the fine-tuned LLMs underwent an external validation using data from a large psychiatric hospital in Colombia, the Hospital Mental de Antioquia, highlighting that most LLMs generalized well (0.02-0.16 point loss in macro-F1). Our study underscores the value of domain-specific adaptation of LLMs and introduces a new model for accurate psychiatric phenotyping in Spanish text, paving the way for global precision psychiatry.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Medicine
117 papers in training set
Top 0.1%
14.3%
2
Frontiers in Psychiatry
83 papers in training set
Top 0.2%
12.5%
3
Translational Psychiatry
219 papers in training set
Top 0.7%
8.4%
4
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
7.2%
5
npj Digital Medicine
97 papers in training set
Top 1%
4.0%
6
Scientific Reports
3102 papers in training set
Top 45%
2.6%
7
Schizophrenia
19 papers in training set
Top 0.2%
2.4%
50% of probability mass above
8
Communications Medicine
85 papers in training set
Top 0.1%
2.1%
9
Biological Psychiatry
119 papers in training set
Top 1%
1.9%
10
JAMA Psychiatry
13 papers in training set
Top 0.2%
1.9%
11
NeuroImage: Clinical
132 papers in training set
Top 2%
1.8%
12
European Journal of Human Genetics
49 papers in training set
Top 0.6%
1.7%
13
Journal of Affective Disorders
81 papers in training set
Top 1.0%
1.7%
14
Psychiatry Research
35 papers in training set
Top 0.9%
1.7%
15
Schizophrenia Bulletin
29 papers in training set
Top 0.4%
1.7%
16
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
17
PLOS ONE
4510 papers in training set
Top 54%
1.7%
18
Genome Medicine
154 papers in training set
Top 5%
1.5%
19
Nature Communications
4913 papers in training set
Top 55%
1.3%
20
Neuropsychopharmacology
134 papers in training set
Top 2%
1.3%
21
BioData Mining
15 papers in training set
Top 0.5%
1.2%
22
Bioinformatics
1061 papers in training set
Top 9%
0.9%
23
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics
22 papers in training set
Top 0.3%
0.9%
24
Nature Mental Health
18 papers in training set
Top 0.2%
0.9%
25
Nature Genetics
240 papers in training set
Top 6%
0.9%
26
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.9%
27
BMC Medicine
163 papers in training set
Top 6%
0.8%
28
European Psychiatry
10 papers in training set
Top 0.6%
0.8%
29
American Journal of Psychiatry
20 papers in training set
Top 0.5%
0.7%
30
Frontiers in Digital Health
20 papers in training set
Top 1%
0.7%