Back

Augmenting Structured Diagnoses through Effective Use of Pre-trained Large Language Models on Clinical Notes

Razzaghi, H.; Nguyen, N.; Pargi, M.; Wieand, K.; Bunnell, T.; Bailey, C.

2026-06-02 health informatics
10.64898/2026.05.30.26354533 medRxiv
Show abstract

Objective Clinical narrative provides a unique window into provider reasoning and attribution, but use has been limited by resource requirements and extensive fine-tuning, and LLMs in particular have traditionally not performed well at medical coding. We optimize and evaluate a reproducible method for automated diagnosis assignment using LLMs in clinical notes and compare with EHR structured diagnoses. Methods We used GPT-OSS for prompt engineering and task segmentation to create a model that extracts ICD-10-CM diagnoses, with estimates of severity, currency, and importance, from progress notes. We assessed performance across multiple cohorts of patients aged 0-21 years. For each, 100 outpatient provider notes were selected across levels of severity, along with coded diagnoses from that visit (EHR); a subset of 130 notes were subjected to clinical expert review. Results Comparison showed 18.7% exact code and 33.3% ICD-10-CM category match between EHR and LLM, but semantic similarity of 0.93 at the category level. Compared to expert review, LLM precision was 0.84 and recall 0.49 for exact matches, and 0.92 and 0.62, respectively, for category-level matching. In contrast, EHR coded diagnoses showed slightly higher precision (0.94 for both cases) and substantially lower recall (0.27 and 0.43) versus expert review. Codes not identified by the LLM were more often rated by the reviewer as lower importance or certainty. Conclusion We demonstrate a reusable approach to optimizing a pretrained LLM for use in diagnosis extraction from clinical notes, facilitating large-scale diagnosis screening by LLMs without the need for expensive study-specific model refinement.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
39.7%
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
12.6%
50% of probability mass above
3
JAMIA Open
37 papers in training set
Top 0.2%
6.4%
4
npj Digital Medicine
97 papers in training set
Top 0.8%
6.4%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
4.0%
6
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.6%
7
JMIR Medical Informatics
17 papers in training set
Top 0.3%
3.6%
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.8%
9
Frontiers in Digital Health
20 papers in training set
Top 0.6%
1.7%
10
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
11
PLOS ONE
4510 papers in training set
Top 56%
1.5%
12
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.3%
13
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.6%
1.2%
14
Journal of Medical Internet Research
85 papers in training set
Top 4%
1.0%
15
The Lancet Digital Health
25 papers in training set
Top 0.8%
1.0%
16
Scientific Reports
3102 papers in training set
Top 70%
0.9%
17
iScience
1063 papers in training set
Top 28%
0.8%
18
JAMA Pediatrics
10 papers in training set
Top 0.2%
0.8%
19
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
20
Heliyon
146 papers in training set
Top 7%
0.7%
21
Inflammatory Bowel Diseases
15 papers in training set
Top 0.3%
0.7%
22
Journal of General Internal Medicine
20 papers in training set
Top 1%
0.6%