Back

Assessing Large Language Models for Oncology Data Inference from Radiology Reports

Chen, L.-C.; Zack, T.; Demirci, A.; Sushil, M.; Miao, B.; Kasap, C.; Butte, A. J.; Collisson, E.; Hong, J.

2024-05-23 oncology
10.1101/2024.05.23.24307579
Show abstract

PurposeWe examined the effectiveness of proprietary and open Large Language Models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports. MethodsWe analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Utilizing GPT-4, GPT-3.5-turbo, and open models like Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist. ResultsAmong 164 pancreatic adenocarcinoma patients, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from "Objective Findings" directly. Most tested models demonstrated proficiency in identifying disease containing anatomical locations from a list of choices, with GPT-4 and Llama3-8B showing near parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant post-surgical changes, impacting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5s interpretations, indicating the variability in human judgment. ConclusionLLMs, especially GPT-4, are proficient in deriving oncological insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and healthcare analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable dataset for further LLM research in oncology.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
based on 14 papers
Top 0.1%
18.0%
2
Scientific Reports
based on 701 papers
Top 26%
6.8%
3
PLOS ONE
based on 1737 papers
Top 64%
6.2%
4
npj Precision Oncology
based on 14 papers
Top 0.2%
5.6%
5
Cancers
based on 57 papers
Top 3%
4.8%
6
Cancer Medicine
based on 17 papers
Top 0.6%
4.8%
7
Computers in Biology and Medicine
based on 39 papers
Top 2%
3.1%
8
International Journal of Radiation Oncology*Biology*Physics
based on 13 papers
Top 0.8%
3.0%
50% of probability mass above
9
Frontiers in Oncology
based on 34 papers
Top 3%
3.0%
10
npj Digital Medicine
based on 85 papers
Top 6%
3.0%
11
eLife
based on 262 papers
Top 10%
2.6%
12
BMC Cancer
based on 21 papers
Top 2%
2.5%
13
PeerJ
based on 46 papers
Top 3%
2.5%
14
Diagnostics
based on 36 papers
Top 3%
1.7%
15
JCO Precision Oncology
based on 11 papers
Top 2%
1.4%
16
Radiotherapy and Oncology
based on 11 papers
Top 1%
1.4%
17
British Journal of Cancer
based on 22 papers
Top 3%
1.4%
18
PLOS Computational Biology
based on 141 papers
Top 7%
1.4%
19
Scientific Data
based on 30 papers
Top 2%
1.4%
20
JMIR Formative Research
based on 31 papers
Top 4%
1.3%
21
Nature Medicine
based on 88 papers
Top 12%
1.3%
22
Clinical Cancer Research
based on 22 papers
Top 3%
1.3%
23
JAMA Network Open
based on 125 papers
Top 16%
0.9%
24
Journal of the American Medical Informatics Association
based on 53 papers
Top 6%
0.9%
25
Nature Communications
based on 483 papers
Top 39%
0.8%
26
JAMIA Open
based on 35 papers
Top 5%
0.8%
27
JMIR Medical Informatics
based on 16 papers
Top 4%
0.8%
28
Neuro-Oncology Advances
based on 14 papers
Top 2%
0.7%
29
Breast Cancer Research
based on 11 papers
Top 2%
0.7%
30
Cancer Epidemiology, Biomarkers & Prevention
based on 14 papers
Top 4%
0.7%