Back

Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4(V) in Challenging Brain MRI Cases

Schramm, S.; Preis, S.; Metz, M.-C.; Jung, K.; Schmitz-Koep, B.; Zimmer, C.; Wiestler, B.; Hedderich, D. M.; Kim, S. H.

2024-03-06 radiology and imaging
10.1101/2024.03.05.24303767
Show abstract

BackgroundRecent studies have explored the application of multimodal large language models (LLMs) in radiological differential diagnosis. Yet, how different multimodal input combinations affect diagnostic performance is not well understood. PurposeTo evaluate the impact of varying multimodal input elements on the accuracy of GPT-4(V)-based brain MRI differential diagnosis. MethodsThirty brain MRI cases with a challenging yet verified diagnosis were selected. Seven prompt groups with variations of four input elements (image, image annotation, medical history, image description) were defined. For each MRI case and prompt group, three identical queries were performed using an LLM-based search engine ((C) PerplexityAI, powered by GPT-4(V)). Accuracy of LLM-generated differential diagnoses was rated using a binary and a numeric scoring system and analyzed using a chi-square test and a Kruskal-Wallis test. Results were corrected for false discovery rate employing the Benjamini-Hochberg procedure. Regression analyses were performed to determine the contribution of each individual input element to diagnostic performance. ResultsThe prompt group containing an annotated image, medical history, and image description as input exhibited the highest diagnostic accuracy (67.8% correct responses). Significant differences were observed between prompt groups, especially between groups that contained the image description among their inputs, and those that did not. Regression analyses confirmed a large positive effect of the image description on diagnostic accuracy (p << 0.001), as well as a moderate positive effect of the medical history (p < 0.001). The presence of unannotated or annotated images had only minor or insignificant effects on diagnostic accuracy. ConclusionThe textual description of radiological image findings was identified as the strongest contributor to performance of GPT-4(V) in brain MRI differential diagnosis, followed by the medical history. The unannotated or annotated image alone yielded very low diagnostic performance. These findings offer guidance on the effective utilization of multimodal LLMs in clinical practice.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
European Radiology
based on 11 papers
Top 0.1%
21.6%
2
Scientific Reports
based on 701 papers
Top 19%
8.3%
3
Neuro-Oncology Advances
based on 14 papers
Top 0.3%
8.3%
4
PLOS ONE
based on 1737 papers
Top 60%
7.0%
5
Brain and Behavior
based on 19 papers
Top 0.9%
2.5%
6
Journal of Magnetic Resonance Imaging
based on 10 papers
Top 1.0%
2.5%
50% of probability mass above
7
Journal of Clinical Medicine
based on 77 papers
Top 6%
2.5%
8
Annals of Translational Medicine
based on 14 papers
Top 1%
2.5%
9
Cureus
based on 64 papers
Top 8%
2.0%
10
Diagnostics
based on 36 papers
Top 2%
1.9%
11
Informatics in Medicine Unlocked
based on 11 papers
Top 0.9%
1.9%
12
NeuroImage: Clinical
based on 77 papers
Top 5%
1.9%
13
Computers in Biology and Medicine
based on 39 papers
Top 4%
1.7%
14
npj Digital Medicine
based on 85 papers
Top 9%
1.7%
15
Heliyon
based on 57 papers
Top 6%
1.4%
16
Scientific Data
based on 30 papers
Top 2%
1.4%
17
Journal of the American Medical Informatics Association
based on 53 papers
Top 5%
1.4%
18
Radiotherapy and Oncology
based on 11 papers
Top 1%
1.4%
19
Medicine
based on 29 papers
Top 6%
1.3%
20
Frontiers in Oncology
based on 34 papers
Top 5%
1.3%
21
Stroke: Vascular and Interventional Neurology
based on 12 papers
Top 1%
0.9%
22
Frontiers in Neurology
based on 74 papers
Top 10%
0.9%
23
International Journal of Medical Informatics
based on 25 papers
Top 6%
0.7%
24
BMC Medical Informatics and Decision Making
based on 36 papers
Top 7%
0.7%
25
Journal of Neurotrauma
based on 11 papers
Top 2%
0.7%
26
JCO Clinical Cancer Informatics
based on 14 papers
Top 4%
0.7%
27
BMC Cancer
based on 21 papers
Top 5%
0.7%
28
Archives of Clinical and Biomedical Research
based on 18 papers
Top 2%
0.7%
29
PLOS Digital Health
based on 88 papers
Top 12%
0.7%