Back

Development and validation of electronic health record-based ascertainment of obsessive-compulsive disorder cases and controls

Wang, B.; Miller-Fleming, T. W.; Yu, D.; Hucks, D.; Gantz, E.; Johnston, R.; Maxwell-Horn, A.; Cox, N.; Sutcliffe, J.; Mathews, C. A.; McArthur, E.; Hatfield, H.; Kabir, D.; Giangrande, E. J.; Fortgang, R. G.; Wang, S. B.; Karmacharya, R.; Roffman, J. L.; Scharf, J. M.; Smoller, J. W.; Soda, T.; Crowley, J. J.; Davis, L. K.

2025-08-07 psychiatry and clinical psychology
10.1101/2025.08.05.25332874 medRxiv
Show abstract

ObjectivesObsessive-compulsive disorder (OCD) is a common psychiatric disorder, with two-thirds of affected individuals reporting severe impairment. Despite its substantial burden and moderate heritability, its etiology remains poorly understood, and treatments are often suboptimal. While recent genome-wide association studies (GWAS) have identified some risk loci, yet OCD remains in the linear phase of sample collection to variant association, with many more OCD-associated variants left to discover. This study aimed to develop and validate an electronic health record (EHR)-based algorithm to identify OCD cases and facilitate large-scale genetic studies. MethodsWe leveraged EHR-linked biobank data from two large hospital systems, namely Vanderbilt University Medical Center (VUMC) and Mass General Brigham (MGB), to develop a high-throughput phenotyping algorithm integrating diagnostic codes, medication records, and natural language processing (NLP) of clinical notes. Algorithm performance was evaluated through expert chart review, and genetic validation was performed using OCD polygenic risk scores (PRS). ResultsExpert chart reviews found that our algorithm combining both ICD codes and NLP achieved higher positive predictive values (PPV) for OCD cases (0.84 at VUMC and 0.91 at MGB) compared to using either ICD codes or NLP alone, albeit with a lower case yield. Furthermore, at both sites, algorithm-determined cases exhibited significantly elevated PRS derived from the latest OCD GWAS, providing genetic validation of our phenotyping approach. ConclusionOur study demonstrates a scalable and cost-efficient approach for EHR-based ascertainment of OCD cases, facilitating large-scale genetic studies and advancing understanding of the disorders complex etiology.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics
22 papers in training set
Top 0.1%
16.9%
2
Translational Psychiatry
219 papers in training set
Top 0.4%
12.1%
3
American Journal of Psychiatry
20 papers in training set
Top 0.1%
8.1%
4
Psychiatry Research
35 papers in training set
Top 0.3%
6.1%
5
Frontiers in Psychiatry
83 papers in training set
Top 0.9%
4.0%
6
Journal of Affective Disorders
81 papers in training set
Top 0.5%
3.8%
50% of probability mass above
7
Acta Neuropsychiatrica
12 papers in training set
Top 0.2%
3.5%
8
Schizophrenia Bulletin
29 papers in training set
Top 0.3%
3.5%
9
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
2.5%
10
Molecular Psychiatry
242 papers in training set
Top 1%
2.3%
11
European Psychiatry
10 papers in training set
Top 0.3%
2.0%
12
PLOS ONE
4510 papers in training set
Top 49%
2.0%
13
Psychological Medicine
74 papers in training set
Top 0.8%
2.0%
14
Biological Psychiatry
119 papers in training set
Top 2%
1.8%
15
JAMA Psychiatry
13 papers in training set
Top 0.2%
1.8%
16
Biological Psychiatry Global Open Science
54 papers in training set
Top 0.7%
1.6%
17
BMC Psychiatry
22 papers in training set
Top 0.4%
1.4%
18
The British Journal of Psychiatry
21 papers in training set
Top 0.7%
1.3%
19
Scientific Reports
3102 papers in training set
Top 68%
1.1%
20
Journal of Psychiatric Research
28 papers in training set
Top 0.6%
1.1%
21
JAMA Network Open
127 papers in training set
Top 3%
0.9%
22
BMJ Mental Health
15 papers in training set
Top 0.3%
0.9%
23
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 1%
0.9%
24
Neuropsychopharmacology
134 papers in training set
Top 2%
0.8%
25
European Neuropsychopharmacology
15 papers in training set
Top 0.6%
0.7%
26
Psychiatry and Clinical Neurosciences
11 papers in training set
Top 0.4%
0.7%