Back

Assessing Large Language Models for Oncology Data Inference from Radiology Reports

Chen, L.-C.; Zack, T.; Demirci, A.; Sushil, M.; Miao, B.; Kasap, C.; Butte, A. J.; Collisson, E.; Hong, J.

2024-05-23 oncology
10.1101/2024.05.23.24307579 medRxiv
Show abstract

PurposeWe examined the effectiveness of proprietary and open Large Language Models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports. MethodsWe analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Utilizing GPT-4, GPT-3.5-turbo, and open models like Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist. ResultsAmong 164 pancreatic adenocarcinoma patients, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from "Objective Findings" directly. Most tested models demonstrated proficiency in identifying disease containing anatomical locations from a list of choices, with GPT-4 and Llama3-8B showing near parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant post-surgical changes, impacting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5s interpretations, indicating the variability in human judgment. ConclusionLLMs, especially GPT-4, are proficient in deriving oncological insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and healthcare analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable dataset for further LLM research in oncology.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
22.5%
2
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
12.3%
3
Frontiers in Oncology
95 papers in training set
Top 0.6%
6.3%
4
Scientific Reports
3102 papers in training set
Top 24%
4.9%
5
npj Digital Medicine
97 papers in training set
Top 1%
4.3%
50% of probability mass above
6
Computers in Biology and Medicine
120 papers in training set
Top 0.7%
4.0%
7
PLOS Computational Biology
1633 papers in training set
Top 9%
4.0%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.4%
9
npj Precision Oncology
48 papers in training set
Top 0.3%
2.4%
10
PLOS ONE
4510 papers in training set
Top 48%
2.1%
11
Cancer Medicine
24 papers in training set
Top 0.6%
1.9%
12
Biology Methods and Protocols
53 papers in training set
Top 0.9%
1.7%
13
JAMA Network Open
127 papers in training set
Top 2%
1.7%
14
European Journal of Cancer
10 papers in training set
Top 0.2%
1.7%
15
JMIR Medical Informatics
17 papers in training set
Top 0.8%
1.5%
16
Database
51 papers in training set
Top 0.5%
1.5%
17
Annals of Biomedical Engineering
34 papers in training set
Top 0.8%
1.3%
18
iScience
1063 papers in training set
Top 19%
1.3%
19
IEEE Access
31 papers in training set
Top 0.6%
1.2%
20
BMC Cancer
52 papers in training set
Top 2%
0.9%
21
BMC Infectious Diseases
118 papers in training set
Top 4%
0.9%
22
JAMIA Open
37 papers in training set
Top 1%
0.9%
23
European Journal of Nuclear Medicine and Molecular Imaging
19 papers in training set
Top 0.2%
0.9%
24
Informatics in Medicine Unlocked
21 papers in training set
Top 0.9%
0.9%
25
Journal of Translational Medicine
46 papers in training set
Top 3%
0.7%
26
PeerJ
261 papers in training set
Top 15%
0.7%
27
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
28
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%
29
British Journal of Cancer
42 papers in training set
Top 2%
0.6%