Back

Regression vs. Medical LLMs: A Comprehensive Study for CVD and Mortality Risk Prediction

KOM SANDE, S. D.; Skorski, M.; Theobald, M.; Schneider, J.; Marz, W.

2026-03-11 health informatics
10.64898/2026.03.11.26347789 medRxiv
Show abstract

Cardiovascular diseases (CVDs) remain the foremost cause of global morbidity and mortality, driving an urgent need for robust predictive tools that enable early detection and preventive intervention. Traditional regression-based models--such as linear and logistic regression, regression trees and forests, and Support Vector Machines (SVMs)--have long underpinned CVD risk estimation but often assume linear relationships, homogeneous effects across populations, and a limited number of predictors. Recent advances in regression, such as bagging and boosting, as well as Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are increasingly shifting this paradigm. In this paper, we review key developments in the context of both classic regression techniques and recent GenAI approaches, and we put a particular focus on openly available Medical LLMs (MedLLMs) in combination with few-shot prompting and classification finetuning. Based on the LURIC cardiovascular health study, we investigate a broad variety of biomarkers and risk factors under two different cohorts of 3,316 CVD risk patients who underwent coronary angiography in Germany between 1997 and 2000. Our results demonstrate that large, pretrained MedLLMs (70B) achieve up to 82% AUROC for 1-year all-cause mortality (1YM) prediction with optimized few-shot prompting, thus performing competitively with recent regression techniques and state-of-the-art methods from the medical literature such as CoroPredict, SMART and SCORE2. Smaller models (8B) can be finetuned to match or even surpass their larger counterparts as well as commercial models like ClaudeSonnet-4.5 and ChatGPT-5.2. Among all evaluated approaches, the best-performing boosting-based regression technique (CatBoost) and commercial LLM (Gemini-3-Flash) both achieve an AUROC of up to 85%. Further model-calibration and -stratification analyses reveal a systematic mortality over-prediction (ECE: 0.05-0.10) of MedLLMs, while Platt scaling effectively reduces such miscalibrations by 60-90%.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Medical Image Analysis
33 papers in training set
Top 0.1%
17.4%
2
Scientific Reports
3102 papers in training set
Top 4%
12.4%
3
Nature Communications
4913 papers in training set
Top 21%
9.1%
4
Communications Medicine
85 papers in training set
Top 0.1%
4.8%
5
Computers in Biology and Medicine
120 papers in training set
Top 0.9%
3.6%
6
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
50% of probability mass above
7
Communications Biology
886 papers in training set
Top 3%
3.2%
8
Patterns
70 papers in training set
Top 0.4%
2.6%
9
The Lancet Digital Health
25 papers in training set
Top 0.3%
1.9%
10
Journal of the American Heart Association
119 papers in training set
Top 3%
1.9%
11
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.2%
1.8%
12
PLOS ONE
4510 papers in training set
Top 54%
1.7%
13
NeuroImage: Clinical
132 papers in training set
Top 2%
1.7%
14
Nature Medicine
117 papers in training set
Top 2%
1.7%
15
Nature Machine Intelligence
61 papers in training set
Top 2%
1.5%
16
eBioMedicine
130 papers in training set
Top 2%
1.5%
17
Frontiers in Immunology
586 papers in training set
Top 5%
1.3%
18
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.3%
19
Advanced Science
249 papers in training set
Top 14%
1.2%
20
NeuroImage
813 papers in training set
Top 5%
1.1%
21
European Heart Journal - Digital Health
15 papers in training set
Top 0.5%
0.9%
22
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.9%
23
European Respiratory Journal
54 papers in training set
Top 2%
0.9%
24
EMBO Molecular Medicine
85 papers in training set
Top 4%
0.8%
25
iScience
1063 papers in training set
Top 29%
0.8%
26
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.7%
27
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
28
Bioinformatics
1061 papers in training set
Top 10%
0.7%
29
Human Brain Mapping
295 papers in training set
Top 4%
0.7%
30
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.6%