Back

A retrieval-augmented generation large language model framework for accurate dementia identification from electronic health records

Wang, L.; Liu, B.; Yang, R.; Chuang, Y.-W.; Estiri, H.; Murphy, S.; Zhou, L.; Marshall, G.

2026-01-25 health informatics
10.64898/2026.01.24.26344477 medRxiv
Show abstract

ObjectiveAccurate and scalable dementia phenotyping from electronic health records (EHRs) is foundational for population-level research, risk prediction, and learning health system interventions. Traditional rule- and keyword-based approaches are limited by inconsistent documentation and inability to capture clinical nuance. We aim to develop and evaluate a framework that leverages large language models (LLMs) with retrieval-augmented generation (RAG) to overcome these limitations and improve dementia identification from real-world EHR data. MethodsUsing EHR data from the Mass General Brigham health system, we first assembled a cohort of adults with potential dementia based on diagnosis codes, problem lists, dementia-related medications, and free-text note mentions. A subset of candidate cases underwent detailed manual chart review to assign gold-standard dementia status. With this labeled sample, we implemented and compared three approaches for dementia ascertainment: (1) a rule-based classifier leveraging structured EHR data, (2) large language models (LLMs) applied to keyword-filtered clinical note excerpts, and (3) a RAG-based LLM framework that integrates retrieved, context-rich note snippets. Within each approach, we evaluated multiple configurations of embedding models, retrieval methods, LLMs, structured-data inclusion, and prompts to identify the best-performing classifier. Performance was assessed using standard classification metrics, including sensitivity, specificity, positive predictive value (PPV), and F1 score, and supplemented by qualitative error analyses to characterize common sources of false positives and false negatives across methods. ResultsThe RAG-based classifier achieved the highest performance (F1=0.933, sensitivity=91.1%, PPV=95.5%) compared to rule-based (F1=0.823, sensitivity=81.1%, PPV=83.5%) and keyword-filtered LLM (F1=0.903, sensitivity=91.7%, PPV=88.6%). Including ICD codes alongside free text in the RAG-based LLM pipeline significantly reduced the PPV and modestly decreased F-1 score. Error analysis revealed that structured-code dependence contributed to false positives, whereas unrecognized contextual cues in notes drove false negatives. ConclusionA RAG-based LLM pipeline without structured ICD codes improved dementia ascertainment from EHR data compared with ICD-based rules and keyword-based filtering. This approach can enhance dementia case identification and support patient care, predictive modeling and risk analysis.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
18.1%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
18.1%
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
8.2%
4
JAMIA Open
37 papers in training set
Top 0.3%
4.7%
5
JMIR Medical Informatics
17 papers in training set
Top 0.3%
3.9%
50% of probability mass above
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
3.8%
7
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.5%
3.5%
8
Journal of Medical Internet Research
85 papers in training set
Top 2%
3.5%
9
The Journal of Prevention of Alzheimer's Disease
10 papers in training set
Top 0.1%
3.0%
10
Scientific Reports
3102 papers in training set
Top 43%
2.8%
11
Frontiers in Digital Health
20 papers in training set
Top 0.4%
2.5%
12
PLOS Digital Health
91 papers in training set
Top 1.0%
2.5%
13
International Journal of Medical Informatics
25 papers in training set
Top 0.6%
2.3%
14
Age and Ageing
27 papers in training set
Top 0.2%
2.0%
15
Alzheimer's Research & Therapy
52 papers in training set
Top 1%
1.8%
16
BMC Medical Research Methodology
43 papers in training set
Top 0.6%
1.7%
17
PLOS ONE
4510 papers in training set
Top 55%
1.6%
18
The Lancet Digital Health
25 papers in training set
Top 0.7%
1.2%
19
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.1%
20
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.9%
21
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
22
JMIR mHealth and uHealth
10 papers in training set
Top 0.5%
0.7%
23
Journal of Alzheimer's Disease
43 papers in training set
Top 1%
0.7%
24
iScience
1063 papers in training set
Top 36%
0.7%
25
Translational Psychiatry
219 papers in training set
Top 5%
0.6%
26
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.6%
27
Annals of Neurology
57 papers in training set
Top 2%
0.6%