Back

Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning

Kruse, M.; Hu, S.; Derby, N.; Wu, Y.; Stonbraker, S.; Yao, B.; Wang, D.; Goldberg, E.; Gao, Y.

2025-07-23 health informatics
10.1101/2025.07.21.25331947 medRxiv
Show abstract

Recent advances in large language models (LLMs) have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs, their Retrieval Augmented Generation (RAG) variants and chain-of-thought (CoT) prompting on long-context clinical summarization and prediction. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context windows improve input integration but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG shows improvements in hallucination in some cases, it does not fully address these limitations. Our work fills the gap in long clinical text summarization, establishing a foundation for evaluating LLMs with multi-modal data and temporal reasoning.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
33.4%
2
npj Digital Medicine
97 papers in training set
Top 0.2%
18.9%
50% of probability mass above
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
7.3%
4
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.4%
4.0%
5
Scientific Reports
3102 papers in training set
Top 35%
3.6%
6
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
3.1%
7
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.6%
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.4%
9
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
10
iScience
1063 papers in training set
Top 15%
1.7%
11
Journal of Personalized Medicine
28 papers in training set
Top 0.4%
1.5%
12
PLOS ONE
4510 papers in training set
Top 60%
1.2%
13
JAMIA Open
37 papers in training set
Top 1%
1.1%
14
Med
38 papers in training set
Top 0.5%
1.0%
15
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.0%
16
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
17
Advanced Science
249 papers in training set
Top 17%
0.8%
18
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
19
Bioinformatics
1061 papers in training set
Top 10%
0.7%
20
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%
21
Nature Machine Intelligence
61 papers in training set
Top 4%
0.5%
22
GigaScience
172 papers in training set
Top 4%
0.5%
23
Nature Communications
4913 papers in training set
Top 67%
0.5%
24
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%
25
Frontiers in Psychiatry
83 papers in training set
Top 4%
0.5%