Back

MechaScreener: Large Language Model-Based Automated Screening for Systematic Reviews and Research

Forbes, C.; Carter, M.; Hudson, C.; Glasziou, P.; Clark, J.

2026-04-30 health informatics
10.64898/2026.04.28.26352009 medRxiv
Show abstract

Systematic Reviews (SRs) are the gold standard for evidence synthesis, but the manual title and abstract screening of thousands of references creates a severe bottleneck. Existing automated tools have historically struggled to achieve the near-perfect recall (sensitivity) required for reliable reviews. We developed MechaScreener as a "zero-shot" automated screening tool that utilises a Large Language Model (LLM) to rank article relevance. The tool requires no initial training data or manual pre-screening, as MechaScreener directly applies user-provided question elements (PICO) or inclusion/exclusion criteria to assign an inclusion probability score (1-5) to each reference. We evaluated the tool in two phases: a development phase using five reference libraries to optimise prompts, and an independent evaluation phase using 10 diverse Cochrane review libraries (comprising both randomised controlled trials and non-RCTs) containing over 58,000 references. In the evaluation dataset, MechaScreener achieved a perfect mean recall of 1.00 (100%, pooled 95% CI: 0.98-1.00), ensuring no relevant articles were missed. Concurrently, it achieved an overall mean specificity of 0.61 (61%, pooled 95% CI: 0.59-0.60). Specificity varied: from 0.21 in broad public health topics to 0.91 in precise pharmacological interventions-reflecting the tools built-in conservatism when evaluating ambiguous abstracts. By safely eliminating over 60% of irrelevant literature during the initial screening phase without compromising recall, MechaScreener functions as a highly reliable but low-effort "first-pass" filter, allowing researchers to substantially reduce manual workloads and reallocate resources toward full-text review and data extraction.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Research Synthesis Methods
20 papers in training set
Top 0.1%
18.6%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
18.6%
3
Nature Communications
4913 papers in training set
Top 25%
7.2%
4
PLOS ONE
4510 papers in training set
Top 27%
6.4%
50% of probability mass above
5
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
4.9%
6
Scientific Data
174 papers in training set
Top 0.5%
3.6%
7
npj Digital Medicine
97 papers in training set
Top 2%
2.6%
8
Bioinformatics
1061 papers in training set
Top 6%
2.6%
9
Scientific Reports
3102 papers in training set
Top 46%
2.4%
10
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
1.9%
11
JAMIA Open
37 papers in training set
Top 0.8%
1.8%
12
BMC Medicine
163 papers in training set
Top 3%
1.8%
13
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
14
PLOS Biology
408 papers in training set
Top 10%
1.7%
15
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 3%
1.5%
16
Annals of Internal Medicine
27 papers in training set
Top 0.6%
1.1%
17
Trials
25 papers in training set
Top 1%
1.0%
18
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.9%
19
JAMA
17 papers in training set
Top 0.3%
0.7%
20
Nature Human Behaviour
85 papers in training set
Top 5%
0.7%
21
JCO Clinical Cancer Informatics
18 papers in training set
Top 1%
0.6%
22
BMJ Open
554 papers in training set
Top 13%
0.6%
23
Neuroscience & Biobehavioral Reviews
43 papers in training set
Top 1%
0.6%