Back

Generation and Evaluation of Realistic Synthetic Clinical Progress Notes for Prostate Cancer using Large Language Models.

Rey-Blanes, A.; Veredas-Morente, J.; Vivas-Vargas, E.; Gil-Garcia, F.; Moreno-Barea, F. J.; Veredas, F. J.

2026-05-28 health informatics
10.64898/2026.05.25.26354027 medRxiv
Show abstract

Background and Objective: Access to real-world electronic health records (EHRs) remains limited by privacy, governance and annotation constraints, hindering the development of clinical natural language processing models. Realistic synthetic progress notes may provide EHR-like corpora that preserve clinically rigorous information on diagnoses, treatments, symptoms, imaging, laboratory findings and therapeutic trajectories without relying directly on sensitive patient records. This study evaluates whether large language models (LLMs) can generate realistic Spanish prostate cancer progress notes from published case reports, preserving clinical content, temporality and hospital-style conventions.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
18.8%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
7.2%
3
npj Digital Medicine
97 papers in training set
Top 0.7%
6.9%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.4%
6.4%
5
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
6.4%
6
International Journal of Medical Informatics
25 papers in training set
Top 0.3%
4.4%
50% of probability mass above
7
JMIR Medical Informatics
17 papers in training set
Top 0.2%
4.3%
8
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
4.3%
9
Scientific Reports
3102 papers in training set
Top 28%
4.2%
10
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.1%
11
Scientific Data
174 papers in training set
Top 0.6%
2.9%
12
JAMIA Open
37 papers in training set
Top 0.5%
2.8%
13
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.1%
14
PLOS ONE
4510 papers in training set
Top 53%
1.7%
15
BMC Medical Research Methodology
43 papers in training set
Top 0.6%
1.7%
16
Biology Methods and Protocols
53 papers in training set
Top 1.0%
1.7%
17
iScience
1063 papers in training set
Top 17%
1.5%
18
Journal of Personalized Medicine
28 papers in training set
Top 0.6%
1.2%
19
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
21
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
22
Data in Brief
13 papers in training set
Top 0.4%
0.8%
23
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1.0%
0.8%
24
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
25
Bioinformatics
1061 papers in training set
Top 10%
0.7%
26
Informatics in Medicine Unlocked
21 papers in training set
Top 1%
0.7%
27
Patterns
70 papers in training set
Top 3%
0.6%
28
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%
29
Healthcare
16 papers in training set
Top 3%
0.5%
30
JMIR Public Health and Surveillance
45 papers in training set
Top 5%
0.5%