Back
Identifying and Characterizing Gallstone Disease from Clinical Narratives with Zero-shot Learning and Automated Prompt Optimization
Hwang, S.; Wang, A.; Batugo, A.; Kaplan, D. E.; Rader, D.; Mowery, D.; Lim, J.
2026-01-30
health informatics
10.64898/2026.01.29.26345132
medRxiv
Show abstract
We built and evaluated a zero-shot LLM pipeline with automated, task-aware prompt optimization to extract radiology and symptom fields for gallstone phenotyping from de-identified EHR text. Across symptomatic, asymptomatic, and control cohorts, it performed reliably on high-signal binary fields and symptom flags but lagged on fine-grained stone burden and complications, establishing a practical baseline and motivating targeted refinements
Matching journals
●Non-profit
◐University press
○Commercial
The top 6 journals account for 50% of the predicted probability mass.
1
npj Digital Medicine
○
97 papers in training set
Top 0.3%
14.6%
2
Nature Biomedical Engineering
○
42 papers in training set
Top 0.1%
12.7%
3
Nature Communications
○
4913 papers in training set
Top 17%
10.2%
4
Nature Medicine
○
117 papers in training set
Top 0.2%
6.9%
5
Nature Machine Intelligence
○
61 papers in training set
Top 0.5%
4.9%
6
Science Translational Medicine
●
111 papers in training set
Top 0.5%
4.4%
50% of probability mass above
7
Nature Methods
○
336 papers in training set
Top 3%
3.6%
8
eLife
●
5422 papers in training set
Top 29%
3.1%
9
Med
○
38 papers in training set
Top 0.1%
2.8%
10
Scientific Reports
○
3102 papers in training set
Top 47%
2.4%
11
The Lancet Digital Health
○
25 papers in training set
Top 0.2%
2.1%
12
Patterns
○
70 papers in training set
Top 0.6%
1.9%
13
Science Advances
●
1098 papers in training set
Top 17%
1.7%
14
Cell Reports Medicine
○
140 papers in training set
Top 3%
1.7%
15
Nature
○
575 papers in training set
Top 10%
1.7%
16
PLOS Digital Health
●
91 papers in training set
Top 2%
1.3%
17
Communications Medicine
○
85 papers in training set
Top 0.5%
1.2%
18
Nature Biotechnology
○
147 papers in training set
Top 6%
1.2%
19
Science
●
429 papers in training set
Top 17%
1.0%
20
JCO Clinical Cancer Informatics
●
18 papers in training set
Top 0.7%
0.9%
21
Cell
○
370 papers in training set
Top 15%
0.9%
22
PLOS ONE
●
4510 papers in training set
Top 66%
0.8%
23
Communications Biology
○
886 papers in training set
Top 21%
0.8%
24
Advanced Science
○
249 papers in training set
Top 17%
0.8%
25
eBioMedicine
○
130 papers in training set
Top 5%
0.7%
26
Nature Computational Science
○
50 papers in training set
Top 2%
0.5%
27
Annals of Internal Medicine
●
27 papers in training set
Top 1%
0.5%
28
Journal of the American Medical Informatics Association
◐
61 papers in training set
Top 2%
0.5%
29
Modern Pathology
○
21 papers in training set
Top 0.6%
0.5%
30
GigaScience
◐
172 papers in training set
Top 4%
0.5%