Back

A Longitudinal Clinical Foundation Model on Nationwide Veteran Health Trajectories

Zamora-Resendiz, R.; Yin, J.; Kimbrel, N. A.; Beckham, J. C.; Crivelli, S.

2026-05-17 health informatics
10.64898/2026.05.13.26353133 medRxiv
Show abstract

We present VA-LLM, a 1.62-billion-parameter autoregressive transformer pre-trained from scratch on 1.74 trillion tokens of clinical text spanning 22 years of care for 13.8 million patients in the Veterans Health Administration, with mortality outcomes confirmed through the National Death Index for 7.8 million patients. In a retrospective-prospective evaluation on 107,555 withheld patients, VA-LLM achieved higher 5-year AUPRC than Llama-2 (7 billion parameters), BioGPT _large (1.57 billion parameters), and GatorTron (3.91 billion parameters), matching GatorTron's 100,000-patient performance with only 10,000 labeled patients. In a clinical validation against the VA's operational Care Assessment Need (CAN) score on 5.5 million patients one year beyond the pre-training corpus, VA-LLM achieved a 90-day mortality AUROC of 90.00% versus 87.74% (p < 0.001) and a 45% relative improvement in AUPRC; post-hoc recalibration recovered calibration comparable to CAN (Brier 0.0091 versus 0.0093) without sacrificing discrimination. Across 21 pre-training checkpoints, discriminative performance correlated more strongly with cumulative mortality experience (CME), the total person-years contributed by patients with confirmed deaths, than with token count ({Delta}R2 = 0.15; Williams p < 10-6). Performance plateaued once marginal cohorts added fewer confirmed deaths, even as pre-training loss continued to decrease. These findings suggest that the clinical composition of pre-training data, particularly the completeness of documented patient trajectories, correlates with predictive performance more closely than corpus size alone.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
27.8%
2
Nature Medicine
117 papers in training set
Top 0.1%
14.4%
3
Nature Communications
4913 papers in training set
Top 29%
6.4%
4
Scientific Reports
3102 papers in training set
Top 18%
6.4%
50% of probability mass above
5
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
6
Nature Biomedical Engineering
42 papers in training set
Top 0.3%
3.6%
7
Med
38 papers in training set
Top 0.1%
3.1%
8
Science Translational Medicine
111 papers in training set
Top 1%
2.6%
9
The Lancet Digital Health
25 papers in training set
Top 0.3%
2.1%
10
Science Advances
1098 papers in training set
Top 17%
1.7%
11
Communications Medicine
85 papers in training set
Top 0.3%
1.7%
12
PLOS ONE
4510 papers in training set
Top 54%
1.7%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.5%
14
Cell Reports Medicine
140 papers in training set
Top 4%
1.5%
15
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.6%
1.3%
16
Nature Neuroscience
216 papers in training set
Top 5%
1.0%
17
Patterns
70 papers in training set
Top 2%
0.9%
18
Annals of Internal Medicine
27 papers in training set
Top 0.8%
0.9%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
20
eBioMedicine
130 papers in training set
Top 4%
0.8%
21
Communications Biology
886 papers in training set
Top 23%
0.8%
22
Nature Computational Science
50 papers in training set
Top 2%
0.8%
23
Nature Human Behaviour
85 papers in training set
Top 6%
0.5%
24
Advanced Science
249 papers in training set
Top 24%
0.5%