Back

Performance of Large Language Models in Automated Medical Literature Screening: A Systematic Review and Meta-analysis

Chenggong, X.; Weichang, K.; Liuting, P.; Diaoxin, Q.; Yuxuan, Y.; Bin, W.; Liang, H.

2026-03-19 epidemiology
10.64898/2026.03.17.26348656 medRxiv
Show abstract

ObjectiveTo systematically evaluate the diagnostic performance of large language models (LLMs) in automated medical literature screening and to determine their potential role in supporting evidence synthesis workflows. MethodsA systematic review and meta-analysis was conducted according to PRISMA DTA guidance. PubMed, Web of Science, Embase, the Cochrane Library and Google Scholar were searched from 1 January 2022 to 17 November 2025. Studies assessing LLMs for automated title and abstract screening or full-text eligibility assessment in medical literature were included. Diagnostic accuracy metrics were extracted and pooled using a bivariate random effects model and hierarchical summary receiver operating characteristic (HSROC) analysis. Subgroup analyses and meta-regression were performed to explore sources of heterogeneity. ResultsEighteen studies published between 2023 and 2025 were included. In title and abstract screening, the pooled sensitivity was 0.92 and pooled specificity was 0.94. The SROC area under the curve (AUC) reached 0.98. In full-text screening, pooled sensitivity and specificity both reached 0.99 and the AUC was 0.99. Prompt strategies incorporating examples or chain-of-thought reasoning significantly improved sensitivity. Across studies, most models were deployed without task specific fine tuning and still achieved strong performance. Subgroup analyses and meta regression did not identify significant sources of heterogeneity. Many studies also reported substantial efficiency gains, including large reductions in screening workload, time and cost. ConclusionLLMs demonstrate high diagnostic accuracy for automated medical literature screening, particularly in full-text assessment. These models show strong potential as high sensitivity assistive tools that can substantially reduce manual screening burden while supporting evidence synthesis. Further methodological optimization and validation in large scale real-world settings are required to establish their long term role in evidence-based medicine.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.4%
13.0%
2
Research Synthesis Methods
20 papers in training set
Top 0.1%
8.6%
3
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
6.5%
4
Journal of Medical Internet Research
85 papers in training set
Top 0.9%
5.0%
5
BMC Medical Research Methodology
43 papers in training set
Top 0.2%
4.4%
6
International Journal of Medical Informatics
25 papers in training set
Top 0.3%
4.4%
7
Scientific Reports
3102 papers in training set
Top 30%
4.0%
8
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.7%
3.7%
9
Healthcare
16 papers in training set
Top 0.2%
3.1%
50% of probability mass above
10
PLOS ONE
4510 papers in training set
Top 43%
2.8%
11
PLOS Biology
408 papers in training set
Top 6%
2.1%
12
BMC Medicine
163 papers in training set
Top 2%
2.1%
13
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.1%
14
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
1.9%
15
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
16
Frontiers in Medicine
113 papers in training set
Top 4%
1.4%
17
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.3%
1.4%
18
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
19
JMIRx Med
31 papers in training set
Top 1%
1.0%
20
BMC Infectious Diseases
118 papers in training set
Top 4%
0.9%
21
Nature Human Behaviour
85 papers in training set
Top 4%
0.9%
22
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
23
Frontiers in Digital Health
20 papers in training set
Top 1%
0.8%
24
JAMA Network Open
127 papers in training set
Top 4%
0.8%
25
British Journal of Cancer
42 papers in training set
Top 1%
0.8%
26
JMIR Medical Informatics
17 papers in training set
Top 1%
0.8%
27
BMJ Open
554 papers in training set
Top 12%
0.8%
28
Cancer Medicine
24 papers in training set
Top 1%
0.8%
29
Systematic Reviews
11 papers in training set
Top 0.5%
0.8%
30
eBioMedicine
130 papers in training set
Top 4%
0.8%