Back

Can large language models approximate human perceptions of disease severity? An evaluation using Global Burden of Disease 2010 disability weights

Ha, Y.; Park, H.; Lee, Y.; Kim, S.; Ahn, S.

2026-05-04 health informatics
10.64898/2026.05.02.26352261 medRxiv
Show abstract

BackgroundDisability weights (DWs) quantify the severity of health loss and are essential for estimating disability-adjusted life years in the Global Burden of Disease (GBD) framework. Conventional DW estimation relies on resource-intensive population surveys that are difficult to update or adapt to emerging health states. Large language models (LLMs) may offer a scalable alternative by approximating human perceptions of disease severity through structured judgment tasks. MethodsThis exploratory study evaluated the alignment between LLM-derived and human-derived DW rankings using 222 health states from GBD 2010. All possible pairwise comparisons (24,531 pairs, each repeated three times) were conducted across four LLMs (GPT-5 mini, GPT-5, Claude Haiku 4.5, and Claude Sonnet 4.5). DWs were estimated via probit regression and evaluated using Spearmans rank correlation and Steigers z test. The effects of prompt language (English vs. Korean), cultural role prompting, and medical specialist role prompting on alignment were examined. Additionally, the Binomial-Logit Indifference-Point (BLIP) estimator was proposed and validated through leave-one-out cross-validation for estimating DWs for health states without established values. ResultsAll four LLMs showed high rank correlation with GBD 2010 DWs (Spearmans {rho} = 0.893 to 0.909), with no significant inter-model differences. Korean-language prompting significantly improved alignment with Korean DWs ({rho} = 0.756 vs. 0.715, p = 0.011), and Korean cultural role prompting improved alignment with both GBD 2010 DWs ({rho} = 0.922 vs. 0.909, p = 0.002) and Korean DWs ({rho} = 0.738 vs. 0.715, p = 0.001). Medical specialist role prompting significantly reduced alignment with GBD 2010 DWs ({rho} = 0.895 vs. 0.909, p = 0.001). BLIP demonstrated strong agreement with GBD 2010 DWs (Pearsons r = 0.862, MAE = 0.066) and produced plausible estimates for Long COVID (mild: 0.020, moderate: 0.298, severe: 0.529). ConclusionsLLMs can approximate human perceptions of disease severity with high rank-order consistency. Prompt language and role framing significantly influenced alignment, with culturally grounded lay prompting enhancing and specialist prompting reducing correspondence with population-based DWs. BLIP provides a practical framework for generating provisional DW estimates for emerging or underrepresented health states when conventional surveys are infeasible.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
9.8%
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
7.0%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.2%
4
npj Digital Medicine
97 papers in training set
Top 0.9%
6.1%
5
JMIR Public Health and Surveillance
45 papers in training set
Top 0.3%
4.7%
6
Journal of Medical Internet Research
85 papers in training set
Top 1.0%
4.7%
7
PLOS ONE
4510 papers in training set
Top 34%
4.2%
8
European Journal of Epidemiology
40 papers in training set
Top 0.2%
3.5%
9
JAMIA Open
37 papers in training set
Top 0.5%
3.5%
10
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.5%
50% of probability mass above
11
Scientific Reports
3102 papers in training set
Top 39%
3.5%
12
The Lancet Digital Health
25 papers in training set
Top 0.2%
3.5%
13
BMJ Open
554 papers in training set
Top 7%
2.3%
14
International Journal of Medical Informatics
25 papers in training set
Top 0.6%
2.3%
15
PLOS Digital Health
91 papers in training set
Top 1%
1.8%
16
Nature Communications
4913 papers in training set
Top 53%
1.6%
17
American Journal of Epidemiology
57 papers in training set
Top 0.8%
1.6%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.6%
19
BMC Public Health
147 papers in training set
Top 4%
1.6%
20
Frontiers in Public Health
140 papers in training set
Top 5%
1.4%
21
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.3%
22
eClinicalMedicine
55 papers in training set
Top 0.9%
1.3%
23
BMC Medicine
163 papers in training set
Top 5%
1.2%
24
Healthcare
16 papers in training set
Top 1%
1.2%
25
JAMA Network Open
127 papers in training set
Top 3%
1.1%
26
eBioMedicine
130 papers in training set
Top 3%
0.9%
27
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.9%
28
Annals of Internal Medicine
27 papers in training set
Top 0.8%
0.9%
29
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 5%
0.9%
30
Epidemiology and Infection
84 papers in training set
Top 3%
0.9%