Back

Bridging Data Gaps in Oncology: Large Language Models and Collaborative Filtering for Cancer Treatment Recommendations

Tang, T.; Li, A.; Tan, X.; Ji, Q.; Si, L.; Bao, L.

2025-04-07 oncology
10.1101/2025.04.07.25325243
Show abstract

BackgroundPatients with rare cancers face substantial challenges due to limited evidence-based treatment options, resulting from sparse clinical trials. Advances in large language models (LLMs) and recommendation algorithms offer new opportunities to utilize all clinical trial information to improve clinical decisions. MethodsWe used LLM to systematically extract and standardize more than 100,000 cancer trials from ClinicalTrials.gov. Each trial was annotated using a customized scoring system reflecting cancer-treatment interactions based on clinical outcomes and trial attributes. Using this structured data set, we implemented three state-of-the-art collaborative filtering algorithms to recommend potentially effective treatments across different cancer types. ResultsThe LLM-driven data extraction process successfully generated a comprehensive and rigorously curated database from fragmented clinical trial information, covering 78 cancer types and 5,315 distinct interventions. Recommendation models demonstrated high predictive accuracy (cross-validated RMSE: 0.49-0.62) and identified clinically meaningful new treatments for melanoma, independently validated by oncology experts. ConclusionsOur study establishes a proof of concept demonstrating that the combination of LLMs with sophisticated recommendation algorithms can systematically identify novel and clinically plausible cancer treatments. This integrated approach may accelerate the identification of effective therapies for rare cancers, ultimately improving patient outcomes by generating evidence-based treatment recommendations where traditional data sources remain limited.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
npj Precision Oncology
based on 14 papers
Top 0.1%
12.5%
2
JCO Clinical Cancer Informatics
based on 14 papers
Top 0.1%
11.0%
3
Nature Communications
based on 483 papers
Top 11%
7.5%
4
PLOS ONE
based on 1737 papers
Top 58%
7.5%
5
Scientific Reports
based on 701 papers
Top 40%
4.7%
6
JCO Precision Oncology
based on 11 papers
Top 0.2%
4.7%
7
Cancers
based on 57 papers
Top 4%
2.9%
50% of probability mass above
8
eLife
based on 262 papers
Top 9%
2.8%
9
Cancer Medicine
based on 17 papers
Top 1%
2.8%
10
Clinical Cancer Research
based on 22 papers
Top 2%
2.8%
11
BMC Cancer
based on 21 papers
Top 2%
2.4%
12
JAMA Network Open
based on 125 papers
Top 8%
2.4%
13
International Journal of Radiation Oncology*Biology*Physics
based on 13 papers
Top 1%
2.2%
14
British Journal of Cancer
based on 22 papers
Top 2%
1.8%
15
PLOS Computational Biology
based on 141 papers
Top 7%
1.6%
16
BMJ Open
based on 553 papers
Top 41%
1.6%
17
Cancer Epidemiology, Biomarkers & Prevention
based on 14 papers
Top 2%
1.6%
18
Frontiers in Oncology
based on 34 papers
Top 5%
1.3%
19
npj Digital Medicine
based on 85 papers
Top 11%
1.3%
20
Journal for ImmunoTherapy of Cancer
based on 14 papers
Top 2%
1.2%
21
Leukemia
based on 11 papers
Top 2%
0.8%
22
Journal of Clinical Epidemiology
based on 29 papers
Top 2%
0.8%
23
Proceedings of the National Academy of Sciences
based on 100 papers
Top 13%
0.8%
24
Journal of Translational Medicine
based on 21 papers
Top 3%
0.7%
25
Nature
based on 58 papers
Top 11%
0.7%
26
Breast Cancer Research
based on 11 papers
Top 2%
0.7%
27
Radiotherapy and Oncology
based on 11 papers
Top 2%
0.7%
28
Informatics in Medicine Unlocked
based on 11 papers
Top 3%
0.7%