Back

Machine Learning in Psychiatric Health Records: A Gold Standard Approach to Trauma Annotation

Atwood, B.; Holderness, E.; Verhagen, M.; Shinn, A. K.; Cawkwell, P.; Cerruti, H.; Pustejovsky, J.; Hall, M.-H.

2025-03-11 psychiatry and clinical psychology
10.1101/2025.03.09.25323272 medRxiv
Show abstract

Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the datasets suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
14.1%
2
Frontiers in Psychiatry
83 papers in training set
Top 0.1%
14.1%
3
Psychiatry Research
35 papers in training set
Top 0.2%
7.0%
4
Acta Neuropsychiatrica
12 papers in training set
Top 0.1%
6.2%
5
Translational Psychiatry
219 papers in training set
Top 1%
4.8%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.8%
50% of probability mass above
7
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
8
Journal of Medical Internet Research
85 papers in training set
Top 2%
3.5%
9
Scientific Reports
3102 papers in training set
Top 39%
3.5%
10
PLOS ONE
4510 papers in training set
Top 43%
3.0%
11
Journal of Affective Disorders
81 papers in training set
Top 0.8%
2.0%
12
European Psychiatry
10 papers in training set
Top 0.3%
1.9%
13
BJPsych Open
25 papers in training set
Top 0.3%
1.9%
14
Frontiers in Digital Health
20 papers in training set
Top 0.5%
1.9%
15
JMIRx Med
31 papers in training set
Top 1.0%
1.3%
16
Nature Medicine
117 papers in training set
Top 3%
1.3%
17
Schizophrenia Bulletin
29 papers in training set
Top 0.5%
1.3%
18
BioData Mining
15 papers in training set
Top 0.6%
0.9%
19
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.5%
0.9%
20
Scientific Data
174 papers in training set
Top 3%
0.7%
21
NeuroImage: Clinical
132 papers in training set
Top 4%
0.7%
22
JAMA Psychiatry
13 papers in training set
Top 0.6%
0.7%
23
BMC Psychiatry
22 papers in training set
Top 0.8%
0.7%
24
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
25
The British Journal of Psychiatry
21 papers in training set
Top 1%
0.7%
26
Epidemiology and Psychiatric Sciences
10 papers in training set
Top 0.5%
0.6%
27
Contemporary Clinical Trials Communications
11 papers in training set
Top 0.8%
0.6%
28
JAMIA Open
37 papers in training set
Top 2%
0.6%