Back

Identifying and Characterizing Gallstone Disease from Clinical Narratives with Zero-shot Learning and Automated Prompt Optimization

Hwang, S.; Wang, A.; Batugo, A.; Kaplan, D. E.; Rader, D.; Mowery, D.; Lim, J.

2026-01-30 health informatics
10.64898/2026.01.29.26345132 medRxiv
Show abstract

We built and evaluated a zero-shot LLM pipeline with automated, task-aware prompt optimization to extract radiology and symptom fields for gallstone phenotyping from de-identified EHR text. Across symptomatic, asymptomatic, and control cohorts, it performed reliably on high-signal binary fields and symptom flags but lagged on fine-grained stone burden and complications, establishing a practical baseline and motivating targeted refinements

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
14.6%
2
Nature Biomedical Engineering
42 papers in training set
Top 0.1%
12.7%
3
Nature Communications
4913 papers in training set
Top 17%
10.2%
4
Nature Medicine
117 papers in training set
Top 0.2%
6.9%
5
Nature Machine Intelligence
61 papers in training set
Top 0.5%
4.9%
6
Science Translational Medicine
111 papers in training set
Top 0.5%
4.4%
50% of probability mass above
7
Nature Methods
336 papers in training set
Top 3%
3.6%
8
eLife
5422 papers in training set
Top 29%
3.1%
9
Med
38 papers in training set
Top 0.1%
2.8%
10
Scientific Reports
3102 papers in training set
Top 47%
2.4%
11
The Lancet Digital Health
25 papers in training set
Top 0.2%
2.1%
12
Patterns
70 papers in training set
Top 0.6%
1.9%
13
Science Advances
1098 papers in training set
Top 17%
1.7%
14
Cell Reports Medicine
140 papers in training set
Top 3%
1.7%
15
Nature
575 papers in training set
Top 10%
1.7%
16
PLOS Digital Health
91 papers in training set
Top 2%
1.3%
17
Communications Medicine
85 papers in training set
Top 0.5%
1.2%
18
Nature Biotechnology
147 papers in training set
Top 6%
1.2%
19
Science
429 papers in training set
Top 17%
1.0%
20
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.7%
0.9%
21
Cell
370 papers in training set
Top 15%
0.9%
22
PLOS ONE
4510 papers in training set
Top 66%
0.8%
23
Communications Biology
886 papers in training set
Top 21%
0.8%
24
Advanced Science
249 papers in training set
Top 17%
0.8%
25
eBioMedicine
130 papers in training set
Top 5%
0.7%
26
Nature Computational Science
50 papers in training set
Top 2%
0.5%
27
Annals of Internal Medicine
27 papers in training set
Top 1%
0.5%
28
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.5%
29
Modern Pathology
21 papers in training set
Top 0.6%
0.5%
30
GigaScience
172 papers in training set
Top 4%
0.5%