Back

Large Language Models in Radiology Reporting - A Systematic Review of Performance, Limitations, and Clinical Implications

Artsi, Y.; Klang, E.; Collins, J. D.; Glicksberg, B. S.; Korfiatis, P.; Nadkarni, G.; Sorin, V.

2025-03-19 radiology and imaging
10.1101/2025.03.18.25324193 medRxiv
Show abstract

BackgroundLarge language models (LLMs) have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. MethodsWe conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and February 2025. Studies evaluating LLM-generated radiology reports were included. The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. ResultsNine studies met the inclusion criteria. Of these, six evaluated full radiology reports, while three focused on impression generation. Six studies assessed base LLMs, and three evaluated fine-tuned models. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. ConclusionLLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
European Radiology
14 papers in training set
Top 0.1%
34.4%
2
Scientific Reports
3102 papers in training set
Top 8%
8.8%
3
The Lancet Digital Health
25 papers in training set
Top 0.1%
5.1%
4
npj Digital Medicine
97 papers in training set
Top 1%
4.1%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 37%
3.7%
6
JAMA Network Open
127 papers in training set
Top 1%
2.9%
7
PLOS Digital Health
91 papers in training set
Top 0.9%
2.7%
8
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.6%
9
Diagnostics
48 papers in training set
Top 0.7%
2.5%
10
Medical Physics
14 papers in training set
Top 0.3%
2.2%
11
Annals of Translational Medicine
17 papers in training set
Top 0.9%
1.3%
12
Frontiers in Medicine
113 papers in training set
Top 4%
1.3%
13
npj Precision Oncology
48 papers in training set
Top 0.8%
1.3%
14
Neuro-Oncology Advances
24 papers in training set
Top 0.4%
0.9%
15
GigaScience
172 papers in training set
Top 2%
0.9%
16
Archives of Clinical and Biomedical Research
28 papers in training set
Top 2%
0.9%
17
eBioMedicine
130 papers in training set
Top 3%
0.8%
18
Frontiers in Oncology
95 papers in training set
Top 3%
0.8%
19
JMIRx Med
31 papers in training set
Top 2%
0.8%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
21
Journal of Medical Imaging
11 papers in training set
Top 0.3%
0.7%
22
Nature Communications
4913 papers in training set
Top 63%
0.7%
23
Stroke: Vascular and Interventional Neurology
13 papers in training set
Top 0.4%
0.7%
24
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.7%
25
BMJ Open
554 papers in training set
Top 13%
0.7%
26
Frontiers in Artificial Intelligence
18 papers in training set
Top 1.0%
0.5%
27
iScience
1063 papers in training set
Top 39%
0.5%
28
IEEE Access
31 papers in training set
Top 1%
0.5%
29
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.5%