Back

The Expertise Paradox: Who Benefits from LLM-Assisted Brain MRI Differential Diagnosis?

Schramm, S.; Le Guellec, B.; Topka, M.; Svec, M.; Backhaus, P.; Eisenkolb, V. M.; Riedel, E. O.; Beyrle, M.; Platzek, P.-S.; Ramschütz, C.; Paprottka, K. J.; Renz, M.; Bodden, J.; Kirschke, J. S.; Ziegelmeyer, S.; Busch, F.; Makowski, M. R.; Adams, L. C.; Bressem, K. K.; Hedderich, D. M.; Wiestler, B.; Kim, S. H.

2025-10-28 radiology and imaging
10.1101/2025.10.28.25338816 medRxiv
Show abstract

PurposeTo evaluate how reader experience influences the diagnostic benefit from LLM assistance in brain MRI differential diagnosis. Materials and MethodsNeuroradiologists (n = 4), radiology residents (n = 4), and neurology/neurosurgery residents (n = 4) were recruited. A dataset of complex brain MRI cases was curated from the local imaging database (n = 40). For each case, readers provided a textual description of the main imaging finding and their top three differential diagnoses ("Unassisted"). Three state-of-the-art large language models (GPT-4.1, Gemini 2.5 Pro, DeepSeek-R1) were prompted to generate top-three differentials based on the clinical case description and reader-specific findings. Readers then revised their differential diagnoses after reviewing GPT-4.1 suggestions ("Assisted"). To evaluate the association between reader experience and diagnostic benefit, a cumulative link mixed model (CLMM) was fitted, with change in diagnostic result as ordinal outcome, reader experience as predictor, and random intercepts for rater and case. ResultsLLM-generated differential diagnoses achieved the highest top-3 accuracy when provided with image descriptions from neuroradiologists (top-3: 78.8-83.8%), followed by radiology residents (top-3: 71.8-77.6%), and neurology/neurosurgery residents (top-3: 62.6-64.5%). In contrast, mean relative gains in top-3 accuracy through LLM assistance diminished with increasing experience, with +19.2% for neurology/neurosurgery residents (from 43.2% to 62.6%), +14.7% for radiology residents (from 59.6% to 74.4%), and +4.4% for neuroradiologists (from 83.1% to 87.5%). The CLMM demonstrated a significant negative association between reader experience and diagnostic benefit from LLM assistance ({beta} = -0.10, p = 0.005). ConclusionWith increasing reader experience, absolute diagnostic LLM performance with reader-generated input improved, while relative diagnostic gains through LLM assistance paradoxically diminished. Our findings call attention to the divergence between standalone LLM performance and clinically relevant reader benefit, and emphasize the need to account for human-AI interaction in this context.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
European Radiology
14 papers in training set
Top 0.1%
42.6%
2
Scientific Reports
3102 papers in training set
Top 8%
8.8%
50% of probability mass above
3
GigaScience
172 papers in training set
Top 0.7%
2.9%
4
PLOS ONE
4510 papers in training set
Top 43%
2.9%
5
NeuroImage: Clinical
132 papers in training set
Top 2%
2.5%
6
Human Brain Mapping
295 papers in training set
Top 2%
2.2%
7
npj Digital Medicine
97 papers in training set
Top 2%
2.0%
8
JAMA Network Open
127 papers in training set
Top 2%
2.0%
9
NeuroImage
813 papers in training set
Top 4%
1.9%
10
Neuro-Oncology Advances
24 papers in training set
Top 0.3%
1.6%
11
Aperture Neuro
18 papers in training set
Top 0.2%
1.6%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
13
iScience
1063 papers in training set
Top 22%
1.2%
14
BMC Medicine
163 papers in training set
Top 5%
1.2%
15
npj Precision Oncology
48 papers in training set
Top 0.9%
1.0%
16
PLOS Digital Health
91 papers in training set
Top 2%
1.0%
17
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
18
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
19
Imaging Neuroscience
242 papers in training set
Top 3%
0.8%
20
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.8%
21
Brain Communications
147 papers in training set
Top 3%
0.8%
22
Journal of Medical Imaging
11 papers in training set
Top 0.3%
0.8%
23
Diagnostics
48 papers in training set
Top 2%
0.8%
24
Nature Communications
4913 papers in training set
Top 65%
0.7%
25
Frontiers in Psychology
49 papers in training set
Top 2%
0.5%
26
Frontiers in Artificial Intelligence
18 papers in training set
Top 1.0%
0.5%
27
Frontiers in Oncology
95 papers in training set
Top 4%
0.5%
28
eBioMedicine
130 papers in training set
Top 6%
0.5%
29
Patterns
70 papers in training set
Top 3%
0.5%
30
Journal of Magnetic Resonance Imaging
14 papers in training set
Top 0.7%
0.5%