Back

Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods: Protocol for an adaptive platform study within reviews

Gartlehner, G.; Banda, S.; Callaghan, M.; Chase, J.-A.; Dobrescu, A.; Eisele-Metzger, A.; Flemyng, E.; Gardner, S.; Griebler, U.; Helfer, B.; Jemiolo, P.; Macura, B.; Minx, J. C.; Noel-Storr, A.; Rajabzadeh Tahmasebi, N.; Sharifan, A.; Meerpohl, J.; Thomas, J.

2026-04-15 health informatics
10.64898/2026.04.13.26350802 medRxiv
Show abstract

Background: Artificial intelligence (AI) has the potential to improve the efficiency of evidence synthesis and reduce human error. However, robust methods for evaluating rapidly evolving AI tools within the practical workflows of evidence synthesis remain underdeveloped. This protocol describes a study design for assessing the effectiveness, efficiency, and usability of AI tools in comparison to traditional human-only workflows in the context of Cochrane systematic reviews. Methods: Members of the Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods Project developed an adaptive platform study-within-a-review (SWAR) design, modeled after clinical platform trials. This design employs a master protocol to concurrently evaluate multiple AI tools (interventions) against a standard human-only process (control) across three key review tasks: title and abstract screening, full-text screening, and data extraction. The adaptive framework allows for the addition or removal of AI tools based on interim performance analyses without necessitating a restart of the study. Performance will be assessed using metrics such as accuracy (sensitivity, specificity, precision), efficiency (time on task), response stability, impact of errors, and usability, in alignment with Responsible use of AI in evidence SynthEsis (RAISE) principles. Results: The study will generate comparative data about the performance and usability of specific AI tools employed in a semi- or fully automated manner relative to standard human effort. The protocol provides a flexible framework for the assessment of AI tools in evidence synthesis, addressing the limitations of static, one-time evaluations. Discussion: This study protocol presents a novel methodological approach to addressing the challenges of evaluating AI tools for evidence syntheses. By validating entire workflows rather than individual technologies, the findings will establish an evidence base for determining the viability of integrating AI into evidence-synthesis workflows. The adaptive design of this study is flexible and can be adopted by other investigators, ensuring that the evaluation framework remains relevant as new tools emerge.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Research Synthesis Methods
20 papers in training set
Top 0.1%
18.3%
2
PLOS ONE
4510 papers in training set
Top 23%
8.3%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
8.1%
4
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
6.3%
5
BMJ Open
554 papers in training set
Top 4%
6.2%
6
Trials
25 papers in training set
Top 0.3%
4.2%
50% of probability mass above
7
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.5%
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.2%
3.5%
9
JAMIA Open
37 papers in training set
Top 0.6%
2.3%
10
BMC Medicine
163 papers in training set
Top 3%
2.0%
11
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
2.0%
12
JMIR Research Protocols
18 papers in training set
Top 0.5%
2.0%
13
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.9%
14
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.9%
15
Pilot and Feasibility Studies
12 papers in training set
Top 0.3%
1.5%
16
Artificial Intelligence in Medicine
15 papers in training set
Top 0.4%
1.5%
17
DIGITAL HEALTH
12 papers in training set
Top 0.4%
1.3%
18
Healthcare
16 papers in training set
Top 1.0%
1.3%
19
PLOS Digital Health
91 papers in training set
Top 2%
1.3%
20
BMC Research Notes
29 papers in training set
Top 0.3%
0.9%
21
Scientific Reports
3102 papers in training set
Top 71%
0.9%
22
npj Digital Medicine
97 papers in training set
Top 3%
0.9%
23
JMIR Medical Informatics
17 papers in training set
Top 1%
0.8%
24
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
25
BMJ Open Quality
15 papers in training set
Top 0.9%
0.7%
26
PeerJ
261 papers in training set
Top 16%
0.7%
27
Wellcome Open Research
57 papers in training set
Top 2%
0.7%
28
Neuroscience & Biobehavioral Reviews
43 papers in training set
Top 1%
0.7%
29
Systematic Reviews
11 papers in training set
Top 0.6%
0.7%
30
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%