Back

Extracting patient reported cannabis use and reasons for use from electronic health records: a benchmarking study of large language models

Wang, Y.; Bozkurt, S.; Le, N.; Alagappan, A.; Huang, C.; Rajwal, S.; Lewis, A.; Kim, J.; Falasinnu, T.

2026-03-09 health informatics
10.64898/2026.03.06.26347824 medRxiv
Show abstract

ObjectiveTo develop and evaluate a scalable and reproducible natural language processing (NLP) approach using large language models (LLM), to identify cannabis use status and reasons for cannabis use among patients with autoimmune rheumatic diseases (ARDs) from unstructured electronic health record (EHR) clinical notes. Methods and AnalysisWe conducted a retrospective study using EHR clinical notes from patients with ARDs (2015-2024). Notes were screened for cannabis-related mentions using fuzzy string matching against a curated keyword lexicon with a similarity threshold of 90, extracting 50-word context windows ({+/-}25 words). Two domain experts annotated 886 randomly sampled snippets across four classes: (1) not a true cannabis mention/uncertain, (2) denial of use, (3) positive past use, and (4) positive current use. Using these annotations, we compared multiple LLM prompting strategies (zero-shot to few-shot; temperature tuning) and a fine-tuned clinical model (GatorTron 345M). For "reason for use," 1,027 snippets were annotated into six categories: pain, nausea, sleep, anxiety/stress/mood, appetite, and not mentioned/unknown. Models were evaluated on a held-out validation set using accuracy, F1, recall, and precision. We then aggregated snippet-level predictions to patient level to describe temporal trends and subgroup differences. ResultsFor cannabis use status classification, the fine-tuned GatorTron model achieved the highest performance (accuracy 0.90; F1 0.91; recall 0.90; precision 0.90). For the reason of cannabis use classification, gpt-oss-20B achieved the highest performance (accuracy 0.77; F1 0.77; recall 0.77; precision 0.86). Patient-level analyses characterized trends in documented cannabis use from 2015-2024 and compared clinical characteristics between current users and patients denying use. ConclusionHigh-precision extraction of cannabis use status and reasons for use from EHR notes is feasible using a combination of fine-tuned clinical language models and LLM-based classifiers. This approach enables scalable measurement of patient-reported symptom self-management strategies in ARDs, supporting observational research and potential clinical decision support.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
14.8%
2
International Journal of Medical Informatics
25 papers in training set
Top 0.1%
10.2%
3
Journal of Medical Internet Research
85 papers in training set
Top 0.6%
7.2%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
6.4%
5
European Respiratory Journal
54 papers in training set
Top 0.3%
4.9%
6
JAMIA Open
37 papers in training set
Top 0.2%
4.9%
7
Frontiers in Digital Health
20 papers in training set
Top 0.2%
4.2%
50% of probability mass above
8
Bioinformatics
1061 papers in training set
Top 5%
4.0%
9
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.9%
10
PLOS ONE
4510 papers in training set
Top 39%
3.6%
11
Scientific Reports
3102 papers in training set
Top 41%
3.1%
12
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
2.6%
13
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
1.8%
14
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
15
Patterns
70 papers in training set
Top 0.8%
1.7%
16
Wellcome Open Research
57 papers in training set
Top 1%
1.3%
17
Rheumatology
21 papers in training set
Top 0.3%
1.3%
18
JMIR Formative Research
32 papers in training set
Top 1.0%
1.3%
19
Acta Neuropsychiatrica
12 papers in training set
Top 0.6%
1.2%
20
JMIR Medical Informatics
17 papers in training set
Top 1%
1.2%
21
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.1%
22
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
23
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
24
eBioMedicine
130 papers in training set
Top 5%
0.7%
25
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.8%
0.7%
26
Trials
25 papers in training set
Top 2%
0.6%
27
Annals of the Rheumatic Diseases
32 papers in training set
Top 0.8%
0.6%
28
BMC Medical Research Methodology
43 papers in training set
Top 2%
0.6%
29
Nature Communications
4913 papers in training set
Top 65%
0.6%
30
BMC Medical Genomics
36 papers in training set
Top 2%
0.6%