Back

Prompt Engineering Strategies Improve the Diagnostic Accuracy of GPT-4 Turbo in Neuroradiology Cases

Wada, A.; Akashi, T.; Shih, G.; Hagiwara, A.; Nishizawa, M.; Hayakawa, Y.; Kikuta, J.; Shimoji, K.; Sano, K.; Kamagata, K.; Nakanishi, A.; Aoki, S.

2024-05-01 radiology and imaging
10.1101/2024.04.29.24306583 medRxiv
Show abstract

BackgroundLarge language models (LLMs) like GPT-4 demonstrate promising capabilities in medical image analysis, but their practical utility is hindered by substantial misdiagnosis rates ranging from 30-50%. PurposeTo improve the diagnostic accuracy of GPT-4 Turbo in neuroradiology cases using prompt engineering strategies, thereby reducing misdiagnosis rates. Materials and MethodsWe employed 751 publicly available neuroradiology cases from the American Journal of Neuroradiology Case of the Week Archives. Prompt instructions guided GPT-4 Turbo to analyze clinical and imaging data, generating a list of five candidate diagnoses with confidence levels. Strategies included role adoption as an imaging expert, step-by-step reasoning, and confidence assessment. ResultsWithout any adjustments, the baseline accuracy of GPT-4 Turbo was 55.1% to correctly identify the top diagnosis, with a misdiagnosis rate of 29.4%. Considering the five candidates improved applicability, it is 70.6%. Applying a 90% confidence threshold increased the accuracy of the top diagnosis to 72.9% and the applicability of the five candidates to 85.9%, while reducing misdiagnoses to 14.1%, but limited the analysis to half of cases. ConclusionPrompt engineering strategies with confidence level thresholds demonstrated the potential to reduce misdiagnosis rates in neuroradiology cases analyzed by GPT-4 Turbo. This research paves the way for enhancing the feasibility of AI-assisted diagnostic imaging, where AI suggestions can contribute to human decision-making processes. However, the study lacks analysis of real-world clinical data. This highlights the need for further investigation in various specialties and medical modalities to optimize thresholds that balance diagnostic accuracy and practical utility.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
European Radiology
14 papers in training set
Top 0.1%
37.8%
2
Scientific Reports
3102 papers in training set
Top 8%
9.3%
3
PLOS Digital Health
91 papers in training set
Top 0.3%
6.4%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 33%
4.4%
5
npj Digital Medicine
97 papers in training set
Top 1%
2.8%
6
GigaScience
172 papers in training set
Top 1%
1.9%
7
Medical Physics
14 papers in training set
Top 0.3%
1.8%
8
Diagnostics
48 papers in training set
Top 1%
1.5%
9
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.4%
1.4%
10
Frontiers in Neuroinformatics
38 papers in training set
Top 0.5%
1.4%
11
JAMA Network Open
127 papers in training set
Top 3%
1.4%
12
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
13
Journal of Medical Imaging
11 papers in training set
Top 0.2%
1.1%
14
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.7%
1.0%
15
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
16
European Journal of Nuclear Medicine and Molecular Imaging
19 papers in training set
Top 0.2%
0.8%
17
JMIRx Med
31 papers in training set
Top 2%
0.8%
18
eBioMedicine
130 papers in training set
Top 4%
0.8%
19
Frontiers in Medicine
113 papers in training set
Top 7%
0.8%
20
Journal of Magnetic Resonance Imaging
14 papers in training set
Top 0.6%
0.7%
21
NeuroImage
813 papers in training set
Top 6%
0.7%
22
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
23
Frontiers in Oncology
95 papers in training set
Top 4%
0.7%
24
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.7%
25
Physics in Medicine & Biology
17 papers in training set
Top 0.5%
0.7%
26
Patterns
70 papers in training set
Top 3%
0.5%
27
iScience
1063 papers in training set
Top 40%
0.5%
28
Stroke: Vascular and Interventional Neurology
13 papers in training set
Top 0.4%
0.5%
29
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.5%