Back

Identifying Psychosis Episodes in Psychiatric Admission Notes via Rule-based Methods, Machine Learning, and Pre-Trained Language Models

Hua, Y.; Blackley, S. V.; Shinn, A. K.; Skinner, J. P.; Moran, L. V.; Zhou, L.

2024-03-19 psychiatry and clinical psychology
10.1101/2024.03.18.24304475 medRxiv
Show abstract

Early and accurate diagnosis is crucial for effective treatment and improved outcomes, yet identifying psychotic episodes presents significant challenges due to its complex nature and the varied presentation of symptoms among individuals. One of the primary difficulties lies in the underreporting and underdiagnosis of psychosis, compounded by the stigma surrounding mental health and the individuals often diminished insight into their condition. Existing efforts leveraging Electronic Health Records (EHRs) to retrospectively identify psychosis typically rely on structured data, such as medical codes and patient demographics, which frequently lack essential information. Addressing these challenges, our study leverages Natural Language Processing (NLP) algorithms to analyze psychiatric admission notes for the diagnosis of psychosis, providing a detailed evaluation of rule-based algorithms, machine learning models, and pre-trained language models. Additionally, the study investigates the effectiveness of employing keywords to streamline extensive note data before training and evaluating the models. Analyzing 4,617 initial psychiatric admission notes (1,196 cases of psychosis versus 3,433 controls) from 2005 to 2019, we discovered that the XGBoost classifier employing Term Frequency-Inverse Document Frequency (TF-IDF) features derived from notes pre-selected by expert-curated keywords, attained the highest performance with an F1 score of 0.8881 (AUROC [95% CI]: 0.9725 [0.9717, 0.9733]). BlueBERT demonstrated comparable efficacy an F1 score of 0.8841 (AUROC [95% CI]: 0.97 [0.9580, 0.9820]) on the same set of notes. Both models markedly outperformed traditional International Classification of Diseases (ICD) code-based detection methods from discharge summaries, which had an F1 score of 0.7608, thus improving the margin by 0.12. Furthermore, our findings indicate that keyword pre-selection markedly enhances the performance of both machine learning and pre-trained language models. This study illustrates the potential of NLP techniques to improve psychosis detection within admission notes and aims to serve as a foundational reference for future research on applying NLP for psychosis identification in EHR notes.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
9.9%
2
Frontiers in Psychiatry
83 papers in training set
Top 0.4%
8.3%
3
Psychiatry Research
35 papers in training set
Top 0.2%
7.0%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
6.3%
5
npj Digital Medicine
97 papers in training set
Top 0.8%
6.2%
6
Journal of Medical Internet Research
85 papers in training set
Top 1.0%
4.8%
7
Schizophrenia Bulletin
29 papers in training set
Top 0.2%
4.8%
8
Translational Psychiatry
219 papers in training set
Top 1%
4.2%
50% of probability mass above
9
Acta Neuropsychiatrica
12 papers in training set
Top 0.2%
3.5%
10
PLOS ONE
4510 papers in training set
Top 41%
3.5%
11
Scientific Reports
3102 papers in training set
Top 39%
3.5%
12
Schizophrenia
19 papers in training set
Top 0.1%
3.5%
13
JMIR Formative Research
32 papers in training set
Top 0.5%
2.7%
14
Schizophrenia Research
29 papers in training set
Top 0.3%
2.3%
15
BioData Mining
15 papers in training set
Top 0.2%
2.0%
16
Frontiers in Digital Health
20 papers in training set
Top 0.7%
1.7%
17
JAMIA Open
37 papers in training set
Top 0.9%
1.5%
18
Journal of Affective Disorders
81 papers in training set
Top 1%
1.2%
19
JAMA Psychiatry
13 papers in training set
Top 0.4%
1.2%
20
JMIRx Med
31 papers in training set
Top 1%
0.9%
21
BJPsych Open
25 papers in training set
Top 0.6%
0.9%
22
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
23
NeuroImage: Clinical
132 papers in training set
Top 4%
0.8%
24
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
25
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
26
BMC Psychiatry
22 papers in training set
Top 0.8%
0.7%
27
Contemporary Clinical Trials Communications
11 papers in training set
Top 0.7%
0.7%
28
Life
27 papers in training set
Top 0.5%
0.7%
29
European Psychiatry
10 papers in training set
Top 0.7%
0.7%
30
Frontiers in Public Health
140 papers in training set
Top 9%
0.6%