Back

JARVIS, should this study be selected for full-text screening? Performance of a Joint AI-ReViewer Interactive Screening tool for systematic reviews

Barreto, G. H. C.; Burke, C.; Davies, P.; Halicka, M.; Paterson, C.; Swinton, P.; Saunders, B.; Higgins, J. P. T.

2026-04-11 health informatics
10.64898/2026.04.08.26350384 medRxiv
Show abstract

BackgroundSystematic reviews are essential for evidence-based decision making in health sciences but require substantial time and resource for manual processes, particularly title and abstract screening. Recent advances in machine learning and large language models (LLMs) have demonstrated promise in accelerating screening with high recall but are often limited by modest gains in efficiency, mostly due to the absence of a generalisable stopping criterion. Here, we introduce and report preliminary findings on the performance of a novel semi-automated active learning system, JARVIS, that integrates LLM-based reasoning using the PICOS framework, neural networks-based classification, and human decision-making to facilitate abstract screening. MethodsDatasets containing author-made inclusion and exclusion decisions from six published systematic reviews were used to pilot the semi-automated screening system. Model performance was evaluated across recall, specificity and area under the curve precision-recall (AUC-PR), using full-text inclusion as the ground truth. Estimated workload and financial savings were calculated by comparing total screening time and reviewer costs across manual and semi-automated scenarios. ResultsAcross the six review datasets, recall ranged between 98.2% and 100%, and specificity ranged between 97.9% and 99.2% at the defined stopping point. Across iterations, AUC-PR values ranged between 83.8% and 100%. Compared with human-only screening, JARVIS delivered workload savings between 71.0% and 93.6%. When a single reviewer read the excluded records, workload savings ranged between 35.6 % and 46.8%. ConclusionThe proposed semi-automated system substantially reduced reviewer workload while maintaining high recall, improving on previously reported approaches. Further validation in larger and more varied reviews, as well as prospective testing, is warranted.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
28.5%
2
Research Synthesis Methods
20 papers in training set
Top 0.1%
19.2%
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
6.5%
50% of probability mass above
4
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
5.0%
5
JAMIA Open
37 papers in training set
Top 0.3%
4.1%
6
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.7%
7
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.1%
8
npj Digital Medicine
97 papers in training set
Top 2%
2.1%
9
BMC Medicine
163 papers in training set
Top 3%
1.9%
10
PLOS ONE
4510 papers in training set
Top 50%
1.9%
11
BMC Medical Research Methodology
43 papers in training set
Top 0.5%
1.7%
12
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.7%
13
Artificial Intelligence in Medicine
15 papers in training set
Top 0.4%
1.4%
14
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.0%
15
JMIR Medical Informatics
17 papers in training set
Top 1%
1.0%
16
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
17
Bioinformatics
1061 papers in training set
Top 9%
0.8%
18
Trials
25 papers in training set
Top 1%
0.8%
19
BMJ Open
554 papers in training set
Top 13%
0.7%
20
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
21
Scientific Reports
3102 papers in training set
Top 78%
0.7%
22
Wellcome Open Research
57 papers in training set
Top 3%
0.7%
23
Healthcare
16 papers in training set
Top 2%
0.7%
24
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%
25
BMC Research Notes
29 papers in training set
Top 0.9%
0.5%
26
JAMA
17 papers in training set
Top 0.5%
0.5%