Back

Comparison of the Diagnostic Performance from Patient's Medical History and Imaging Findings between GPT-4 based ChatGPT and Radiologists in Challenging Neuroradiology Cases

Horiuchi, D.; Tatekawa, H.; Oura, T.; Oue, S.; Walston, S. L.; Takita, H.; Matsushita, S.; Mitsuyama, Y.; Shimono, T.; Miki, Y.; Ueda, D.

2023-08-29 radiology and imaging
10.1101/2023.08.28.23294607
Show abstract

PurposeTo compare the diagnostic performance between Chat Generative Pre-trained Transformer (ChatGPT), based on the GPT-4 architecture, and radiologists from patients medical history and imaging findings in challenging neuroradiology cases. MethodsWe collected 30 consecutive "Freiburg Neuropathology Case Conference" cases from the journal Clinical Neuroradiology between March 2016 and June 2023. GPT-4 based ChatGPT generated diagnoses from the patients provided medical history and imaging findings for each case, and the diagnostic accuracy rate was determined based on the published ground truth. Three radiologists with different levels of experience (2, 4, and 7 years of experience, respectively) independently reviewed all the cases based on the patients provided medical history and imaging findings, and the diagnostic accuracy rates were evaluated. The Chi-square tests were performed to compare the diagnostic accuracy rates between ChatGPT and each radiologist. ResultsChatGPT achieved an accuracy rate of 23% (7/30 cases). Radiologists achieved the following accuracy rates: a junior radiology resident had 27% (8/30) accuracy, a senior radiology resident had 30% (9/30) accuracy, and a board-certified radiologist had 47% (14/30) accuracy. ChatGPTs diagnostic accuracy rate was lower than that of each radiologist, although the difference was not significant (p = 0.99, 0.77, and 0.10, respectively). ConclusionThe diagnostic performance of GPT-4 based ChatGPT did not reach the performance level of either junior/senior radiology residents or board-certified radiologists in challenging neuroradiology cases. While ChatGPT holds great promise in the field of neuroradiology, radiologists should be aware of its current performance and limitations for optimal utilization.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
European Radiology
based on 11 papers
Top 0.1%
21.1%
2
Scientific Reports
based on 701 papers
Top 13%
10.9%
3
Neuro-Oncology Advances
based on 14 papers
Top 0.4%
6.9%
4
PLOS ONE
based on 1737 papers
Top 60%
6.9%
5
Diagnostics
based on 36 papers
Top 0.6%
4.8%
50% of probability mass above
6
Cureus
based on 64 papers
Top 5%
3.0%
7
Stroke: Vascular and Interventional Neurology
based on 12 papers
Top 0.9%
2.5%
8
Journal of Magnetic Resonance Imaging
based on 10 papers
Top 1.0%
2.5%
9
Annals of Translational Medicine
based on 14 papers
Top 1%
2.4%
10
Scientific Data
based on 30 papers
Top 0.9%
2.4%
11
Brain and Behavior
based on 19 papers
Top 1%
2.4%
12
Frontiers in Neurology
based on 74 papers
Top 7%
1.9%
13
Frontiers in Oncology
based on 34 papers
Top 4%
1.9%
14
Heliyon
based on 57 papers
Top 5%
1.7%
15
Medicine
based on 29 papers
Top 4%
1.7%
16
Informatics in Medicine Unlocked
based on 11 papers
Top 1%
1.4%
17
Radiotherapy and Oncology
based on 11 papers
Top 1%
1.4%
18
Journal of Clinical Medicine
based on 77 papers
Top 12%
1.3%
19
BMC Cancer
based on 21 papers
Top 4%
0.9%
20
NeuroImage: Clinical
based on 77 papers
Top 7%
0.9%
21
Journal of Neurotrauma
based on 11 papers
Top 2%
0.9%
22
Nature Communications
based on 483 papers
Top 39%
0.9%
23
Cancers
based on 57 papers
Top 7%
0.7%
24
Magnetic Resonance in Medicine
based on 11 papers
Top 1%
0.7%
25
Journal of NeuroEngineering and Rehabilitation
based on 14 papers
Top 3%
0.7%