Back

Implementation of Human-in-the-Loop ChatGPT-based Patient Screening Across Multiple Diverse Clinical Trials

Dohopolski, M.; Esselink, K.; Desai, N.; Grones, B.; Patel, T.; Jiang, S.; Peterson, E.; Navar, A. M.

2026-03-27 health informatics
10.64898/2026.03.20.26348890 medRxiv
Show abstract

Purpose: Manual screening for trial eligibility is inefficient and costly. We prospectively evaluated a large language model (LLM)-assisted prescreening workflow across multiple active trials. Methods: We deployed a retrieval-augmented generation LLM-based pipeline across multiple trials at an academic medical center. Structured electronic health record data and free-text notes were used by the LLM to classify each criterion as either met, likely met, likely not met, not met, uncertain, or no documentation found, with accompanying rationale. Coordinators were provided a sorted patient list based on LLM-derived eligibility and reviewed each case, documenting their assessment of individual criteria and final prescreening status (success vs failure). Criterion-level performance--accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score--was calculated and tracked over time. Patient prescreening status was also evaluated as a function of the percentage of individual AI criteria met (60--80% and [≥]80%). Results: From October 2024--September 2025, 39,182 patients were prescreened using the LLM workflow across 26 studies (21 oncology and 5 non-oncology), encompassing 112 distinct criteria. A total of 914 patients with high likelihood of eligibility underwent coordinator review (5,096 criteria evaluated). Aggregated criterion-level performance was as follows: accuracy 0.94 (95% CI, 0.92--0.96), sensitivity 0.98 (0.97--0.99), specificity 0.81 (0.71--0.88), PPV 0.95 (0.92--0.97), NPV 0.93 (0.90--0.95), and F1 score 0.97 (0.95--0.97). Twenty-seven criteria prompts across 14/26 trials were automatically updated based on coordinator feedback. Patients with [≥]80% of AI-labeled criteria classified as met or likely met were more likely to be reviewed by coordinators (544/987, 55.1% vs 372/397, 93.7%) and more likely to be labeled as prescreening successes (104/544, 19.1% vs 162/372, 43.5%) compared to those with 60--80%. The average cost was $0.12 per patient. Conclusion: An LLM-assisted, human-in-the-loop prescreening workflow demonstrated high criterion-level performance at low cost across a diverse set of actively enrolling clinical trials. Structured coordinator feedback enabled an automated learning system, improving screening efficiency while preserving necessary human oversight.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
22.0%
2
npj Digital Medicine
97 papers in training set
Top 0.3%
17.1%
3
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
12.0%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 41%
3.5%
5
Journal of Medical Internet Research
85 papers in training set
Top 2%
3.0%
6
Cancer Medicine
24 papers in training set
Top 0.5%
2.5%
7
BMJ Health & Care Informatics
13 papers in training set
Top 0.3%
2.4%
8
Annals of Internal Medicine
27 papers in training set
Top 0.3%
2.3%
9
The Lancet Digital Health
25 papers in training set
Top 0.3%
2.0%
10
JAMIA Open
37 papers in training set
Top 0.7%
1.8%
11
Nature Communications
4913 papers in training set
Top 49%
1.8%
12
JAMA Network Open
127 papers in training set
Top 2%
1.7%
13
Journal of Clinical Epidemiology
28 papers in training set
Top 0.3%
1.7%
14
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.4%
15
Scientific Reports
3102 papers in training set
Top 67%
1.2%
16
PLOS Digital Health
91 papers in training set
Top 2%
1.2%
17
Clinical and Translational Science
21 papers in training set
Top 0.6%
1.2%
18
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
19
Trials
25 papers in training set
Top 2%
0.8%
20
eBioMedicine
130 papers in training set
Top 4%
0.8%
21
BMC Medicine
163 papers in training set
Top 7%
0.7%
22
BMJ Open
554 papers in training set
Top 13%
0.7%
23
Frontiers in Digital Health
20 papers in training set
Top 1%
0.7%
24
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.7%
25
Cell Reports Medicine
140 papers in training set
Top 9%
0.7%
26
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%