Back

Large Language Models forecast Patient Health Trajectories enabling Digital Twins

Makarov, N.; Bordukova, M.; Rodriguez-Esteban, R.; Schmich, F.; Menden, M. P.

2024-08-16 health informatics
10.1101/2024.07.05.24309957 medRxiv
Show abstract

BackgroundGenerative artificial intelligence (AI) accelerates the development of digital twins, which enable virtual representations of real patients to explore, predict and simulate patient health trajectories, ultimately aiding treatment selection and clinical trial design. Recent advances in forecasting utilizing generative AI, in particular large language models (LLMs), highlights untapped potential to overcome real-world data (RWD) challenges such as missingness, noise and limited sample sizes, thus empowering the next generation of AI algorithms in healthcare. MethodsWe developed the Digital Twin - Generative Pretrained Transformer (DT-GPT) model, which utilizes biomedical LLMs using rich electronic health record (EHR) data. Our method eliminates the need for data imputation and normalization, enables forecasting of clinical variables, and preliminary explainability through a human-interpretable interface. We benchmarked DT-GPT on RWD including long-term US nationwide non-small cell lung cancer (NSCLC) and short-term Intensive Care Unit (ICU) datasets. FindingsDT-GPT surpassed state-of-the-art machine learning methods in patient trajectory forecasting on mean absolute error (MAE) for both the long-term (3.4% MAE improvement) and the short-term (1.3% MAE improvement) dataset. Additionally, DT-GPT was capable of preserving cross-correlations of clinical variables (average R2 of 0.98), handling data missingness and noise. Finally, we discovered the ability of DT-GPT to provide insights into a forecasts rationale and to perform zero-shot forecasting on variables not used during fine-tuning, outperforming even fully trained task-specific machine learning models on 13 clinical variables. InterpretationDT-GPT demonstrates that LLMs can serve as a robust medical forecasting platform, empowering digital twins which virtually replicate patient characteristics beyond their training data. We envision that LLM-based digital twins will enable a variety of use cases, including clinical trial simulations, treatment selection and adverse event mitigation.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
65.2%
50% of probability mass above
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.4%
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.3%
4.2%
4
Nature Biomedical Engineering
42 papers in training set
Top 0.8%
1.7%
5
Patterns
70 papers in training set
Top 0.9%
1.7%
6
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
7
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
8
Scientific Reports
3102 papers in training set
Top 63%
1.3%
9
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.1%
10
Frontiers in Digital Health
20 papers in training set
Top 1%
1.0%
11
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
12
Nature Medicine
117 papers in training set
Top 4%
0.8%
13
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.8%
14
The Lancet Digital Health
25 papers in training set
Top 1%
0.8%
15
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
16
Advanced Science
249 papers in training set
Top 20%
0.7%
17
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
18
Communications Medicine
85 papers in training set
Top 1%
0.7%
19
Nature Communications
4913 papers in training set
Top 67%
0.5%
20
JCO Clinical Cancer Informatics
18 papers in training set
Top 1%
0.5%