Back

Quantifying Cancer Clinical Trial Eligibility Using Artificial Intelligence-Based Matching

Goel, K. P.; Myall, N. J.; Dickerson, J.; Caswell-Jin, J. L.; Johnson, T.; Worth, J. E.; Gensheimer, M. F.

2026-06-05 oncology
10.64898/2026.06.03.26354859 medRxiv
Show abstract

PURPOSE: To develop and validate an artificial intelligence-enabled platform that converts unstructured cancer trial eligibility criteria into structured queries and quantifies trial eligibility across advanced/metastatic cancer trials. METHODS: We downloaded actively recruiting US interventional treatment trials for advanced/metastatic breast cancer, colon cancer, and non-small cell lung cancer from ClinicalTrials.gov. Medical oncologists created 24 synthetic patient vignettes. A large language model converted trial eligibility criteria into Structured Query Language (SQL) code and patient information into structured records, enabling automated matching. Cancer details and treatment history were considered, but not laboratory results or comorbidities. Validation included physician editing of generated eligibility code for 30 trials, and blinded physician eligibility assessment for five trials. We then evaluated how age, ECOG performance status, sex, and ZIP code affected the number of eligible trials. RESULTS: Of 833 candidate trials, 746 met inclusion criteria. In physician review of 30 trials, edits to generated SQL did not change any of 720 trial-patient eligibility determinations for 24 synthetic patients. In blinded validation across 120 trial-patient pairs, automated matching achieved 97% accuracy. Across synthetic patients, eligible trials ranged from 31 to 258 when there were no geographic restrictions. Eligibility decreased markedly with worse performance status and with geographic restriction (both p<0.001). Later-phase, randomized, and molecularly selective trials had fewer eligible patients. CONCLUSION: AI-based structuring of trial eligibility criteria can support accurate, scalable measurement of potential cancer trial eligibility. In this demonstration, performance status, geography, and age were major determinants of eligibility across the active metastatic trial landscape.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
26.4%
2
npj Digital Medicine
97 papers in training set
Top 0.2%
19.0%
3
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
50% of probability mass above
4
npj Precision Oncology
48 papers in training set
Top 0.1%
4.0%
5
PLOS ONE
4510 papers in training set
Top 37%
3.7%
6
Scientific Reports
3102 papers in training set
Top 34%
3.7%
7
European Journal of Cancer
10 papers in training set
Top 0.2%
1.7%
8
Database
51 papers in training set
Top 0.5%
1.4%
9
Frontiers in Oncology
95 papers in training set
Top 2%
1.4%
10
Cancer Medicine
24 papers in training set
Top 0.9%
1.4%
11
iScience
1063 papers in training set
Top 21%
1.2%
12
JCO Precision Oncology
14 papers in training set
Top 0.3%
1.2%
13
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.1%
14
Journal of Translational Medicine
46 papers in training set
Top 2%
1.1%
15
JAMA Network Open
127 papers in training set
Top 3%
1.0%
16
Nature Communications
4913 papers in training set
Top 60%
0.9%
17
Nature Cancer
35 papers in training set
Top 1%
0.9%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
20
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.8%
21
BMC Infectious Diseases
118 papers in training set
Top 5%
0.8%
22
Informatics in Medicine Unlocked
21 papers in training set
Top 1%
0.8%
23
Inflammatory Bowel Diseases
15 papers in training set
Top 0.3%
0.8%
24
Healthcare
16 papers in training set
Top 2%
0.8%
25
Cureus
67 papers in training set
Top 5%
0.8%
26
Biomedicines
66 papers in training set
Top 3%
0.8%
27
JAMIA Open
37 papers in training set
Top 1%
0.8%
28
PeerJ
261 papers in training set
Top 14%
0.8%
29
PLOS Digital Health
91 papers in training set
Top 3%
0.8%
30
Clinical Cancer Research
58 papers in training set
Top 2%
0.7%