Artificial Intelligence in Neuro-Oncology: Assessing ChatGPT Accuracy in MRI Interpretation and Treatment Advice
Ishaque, A. H.; Boutet, A.; Hiremath, S. B.; Mullarkey, M. P.; Peris-Celda, M.; Zadeh, G.
Show abstract
PurposeLarge language models (LLMs) have demonstrated advanced capabilities in interpreting text and visual inputs. Their potential to transform oncological practice is significant, but their accuracy and reliability in interpreting medical imaging and offering management suggestions remain underexplored. This study aimed to evaluate the performance of ChatGPT in interpreting T1-weighted contrast-enhanced MRI images of meningiomas and glioblastomas and providing treatment recommendations based on simulated patient inquiries. MethodsThis observational cohort study utilized publicly available MRI datasets. Thirty cases of meningiomas and glioblastomas were randomly selected, yielding 90 images (three orthogonal planes per case). ChatGPT-4o was tasked with interpreting these images and responding to six standardized patient-simulated questions. Two neuroradiologists and neurosurgeons assessed ChatGPTs performance using five-point Likert scales and their inter-rater agreement was evaluated. ResultsChatGPT identified MRI sequences with 91.7% accuracy and localized tumors correctly in 66.7% of cases. Tumor size was qualitatively described in 85% of cases, and the median acceptability was rated as 4.0 (IQR 4.0-5.0) by neuroradiologists. ChatGPT included meningioma in the differential diagnosis for 73.3% of meningioma cases and glioma in 83.3% of glioblastoma cases. Inter-rater agreement among neuroradiologists ranged from moderate to good ({kappa} = 0.45-0.72). While surgical treatment was suggested in all symptomatic cases, neurosurgeon acceptability ratings varied, with poor inter-rater reliability. ConclusionsChatGPT demonstrates potential in interpreting neuro-oncological MRI images and offering preliminary management recommendations. However, errors in tumor localization and variability in recommendation acceptability underscore the need for physician oversight and further refinement of LLMs before clinical integration.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.