Back

Comparison of the diagnostic accuracy among GPT-4 based ChatGPT, GPT-4V based ChatGPT, and radiologists in musculoskeletal radiology

Horiuchi, D.; Tatekawa, H.; Oura, T.; Shimono, T.; Walston, S. L.; Takita, H.; Matsushita, S.; Mitsuyama, Y.; Miki, Y.; Ueda, D.

2023-12-09 radiology and imaging
10.1101/2023.12.07.23299707
Show abstract

ObjectiveTo compare the diagnostic accuracy of Generative Pre-trained Transformer (GPT)-4 based ChatGPT, GPT-4 with vision (GPT-4V) based ChatGPT, and radiologists in musculoskeletal radiology. Materials and MethodsWe included 106 "Test Yourself" cases from Skeletal Radiology between January 2014 and September 2023. We input the medical history and imaging findings into GPT-4 based ChatGPT and the medical history and images into GPT-4V based ChatGPT, then both generated a diagnosis for each case. Two radiologists (a radiology resident and a board-certified radiologist) independently provided diagnoses for all cases. The diagnostic accuracy rates were determined based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4 based ChatGPT, GPT-4V based ChatGPT, and radiologists. ResultsGPT-4 based ChatGPT significantly outperformed GPT-4V based ChatGPT (p < 0.001) with accuracy rates of 43% (46/106) and 8% (9/106), respectively. The radiology resident and the board-certified radiologist achieved accuracy rates of 41% (43/106) and 53% (56/106). The diagnostic accuracy of GPT-4 based ChatGPT was comparable to that of the radiology resident but was lower than that of the board-certified radiologist, although the differences were not significant (p = 0.78 and 0.22, respectively). The diagnostic accuracy of GPT-4V based ChatGPT was significantly lower than those of both radiologists (p < 0.001 and < 0.001, respectively). ConclusionGPT-4 based ChatGPT demonstrated significantly higher diagnostic accuracy than GPT-4V based ChatGPT. While GPT-4 based ChatGPTs diagnostic performance was comparable to radiology residents, it did not reach the performance level of board-certified radiologists in musculoskeletal radiology.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
European Radiology
based on 11 papers
Top 0.1%
21.7%
2
Scientific Reports
based on 701 papers
Top 6%
13.8%
3
Diagnostics
based on 36 papers
Top 0.2%
7.0%
4
PLOS ONE
based on 1737 papers
Top 62%
6.4%
5
Annals of Translational Medicine
based on 14 papers
Top 0.7%
3.1%
50% of probability mass above
6
Cureus
based on 64 papers
Top 6%
2.7%
7
Frontiers in Oncology
based on 34 papers
Top 3%
2.7%
8
Neuro-Oncology Advances
based on 14 papers
Top 0.7%
2.6%
9
npj Digital Medicine
based on 85 papers
Top 8%
2.0%
10
Journal of Magnetic Resonance Imaging
based on 10 papers
Top 1%
1.9%
11
Radiotherapy and Oncology
based on 11 papers
Top 1%
1.7%
12
BMC Cancer
based on 21 papers
Top 3%
1.7%
13
Heliyon
based on 57 papers
Top 6%
1.5%
14
Medicine
based on 29 papers
Top 5%
1.5%
15
Journal of Clinical Medicine
based on 77 papers
Top 10%
1.5%
16
Informatics in Medicine Unlocked
based on 11 papers
Top 1%
1.5%
17
Computers in Biology and Medicine
based on 39 papers
Top 5%
1.3%
18
PLOS Digital Health
based on 88 papers
Top 11%
0.9%
19
Journal of the American Medical Informatics Association
based on 53 papers
Top 6%
0.9%
20
Stroke: Vascular and Interventional Neurology
based on 12 papers
Top 1%
0.9%
21
Cancers
based on 57 papers
Top 6%
0.9%
22
Scientific Data
based on 30 papers
Top 3%
0.9%
23
Rheumatology
based on 21 papers
Top 1.0%
0.7%
24
BMJ Open
based on 553 papers
Top 51%
0.7%
25
eBioMedicine
based on 82 papers
Top 7%
0.7%
26
International Journal of Cancer
based on 18 papers
Top 2%
0.7%
27
JCO Clinical Cancer Informatics
based on 14 papers
Top 4%
0.7%
28
Archives of Clinical and Biomedical Research
based on 18 papers
Top 2%
0.7%