Back

Enhancing Medical Knowledge in Large Language Models via Supervised Continued Pretraining on Clinical Notes

Weissenbacher, D.; Shabbir, M.; Campbell, I. M.; Berdahl, C. T.; Gonzalez-Hernandez, G.

2026-04-04 health informatics
10.64898/2026.04.02.26350065 medRxiv
Show abstract

Background: Large language models (LLMs) contain limited professional medical knowledge, as large-scale training on clinical text has not yet been possible due to restricted access. Objectives: To continue pre-training an open-access instruct LLM on de-identified medical notes and evaluate the resulting impact on real-world clinical decision-making tasks and standard benchmarks. Methods: Using 500K de-identified clinical notes from Cedars-Sinai Health System, we fine-tuned a Qwen3-4B Instruct model with supervised learning to generate medical decision-making (MDM) paragraphs from patient presentations, and evaluated it on assigned-diagnosis prediction, in-hospital cardiac-arrest mention detection, and a suite of general and biomedical benchmarks. Results: The fine-tuned model produced MDMs that closely resembled those written by physicians and outperformed the base-instruct model and larger clinically untrained models (Qwen3-32B and Llama-3.1-405B Instruct) on assigned-diagnosis prediction, the task most aligned with its training objective. On the task of detecting in-hospital cardiac arrest mentions, the model initially exhibited mild label collapse, but a brief task-specific fine-tuning stage resolved this issue and allowed it to surpass all competitors. The model also demonstrated global general knowledge retention on biomedical and general-domain evaluation benchmarks compared to the baseline. Conclusion: Supervised full fine-tuning on clinical notes allowed the model to incorporate medical knowledge without sacrificing general-domain abilities, and to transfer this knowledge to unseen biomedical tasks without wholesale loss of general-domain abilities, while revealing collapse-related failure modes that motivate more principled strategies for clinical specialization.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
10.1%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
8.4%
3
npj Digital Medicine
97 papers in training set
Top 0.8%
6.4%
4
Scientific Reports
3102 papers in training set
Top 18%
6.4%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.8%
6
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
4.3%
7
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
4.0%
8
International Journal of Medical Informatics
25 papers in training set
Top 0.3%
4.0%
9
Biology Methods and Protocols
53 papers in training set
Top 0.3%
3.7%
50% of probability mass above
10
Bioinformatics
1061 papers in training set
Top 5%
3.6%
11
PLOS Digital Health
91 papers in training set
Top 0.9%
3.1%
12
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.1%
13
PLOS ONE
4510 papers in training set
Top 48%
2.1%
14
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.2%
1.9%
15
Frontiers in Digital Health
20 papers in training set
Top 0.6%
1.8%
16
Nature Machine Intelligence
61 papers in training set
Top 2%
1.8%
17
JMIR Medical Informatics
17 papers in training set
Top 0.7%
1.7%
18
JAMIA Open
37 papers in training set
Top 0.8%
1.7%
19
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.5%
20
Nature Medicine
117 papers in training set
Top 3%
1.5%
21
The Lancet Digital Health
25 papers in training set
Top 0.6%
1.3%
22
eBioMedicine
130 papers in training set
Top 2%
1.2%
23
Nature Communications
4913 papers in training set
Top 56%
1.2%
24
iScience
1063 papers in training set
Top 22%
1.2%
25
Healthcare
16 papers in training set
Top 1%
1.1%
26
Patterns
70 papers in training set
Top 2%
0.9%
27
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
28
BioData Mining
15 papers in training set
Top 0.8%
0.8%
29
Cureus
67 papers in training set
Top 5%
0.7%
30
Med
38 papers in training set
Top 0.8%
0.7%