Back

Fast clinical trial identification using fuzzy-search elastic searches: retrospective validation with high-quality Cochrane benchmark

Otte, W. M.; van IJzendoorn, D. G.; Habets, P. C.; Vinkers, C. H.

2023-09-06 epidemiology
10.1101/2023.09.06.23295135 medRxiv
Show abstract

The synthesis of treatment effects relies on systematic reviews of intervention trials. This process is often laborious due to the need for precise search queries and manual study identification. Recent advancements in database architecture and natural language processing (NLP) offer a potential solution by enabling faster, more flexible searches and automated extraction of information from unstructured texts. Our study assesses the effectiveness of NLP-based literature searches within a novel database structure in comparison to the Cochrane Database of Systematic Reviews. We created a user-friendly elastic search database containing 36 million PubMed-indexed entries. We developed reliable filters for identifying randomized clinical trials and clinical intervention studies, as well as extracting relevant subtext related to population and intervention. Our results indicate a high precision of 0.74, recall of 0.81, and F1-score of 0.77 for population subtext, and a precision of 0.70, recall of 0.71, and an F1-score of 0.70 for intervention subtext. Our approach efficiently identified included studies in 90% of systematic reviews, missing no more than two trials compared to Cochrane. Furthermore, it produced fewer total hits than a comparable PubMed keyword search, demonstrating the potential of the new database structure to enhance the efficiency and effectiveness of aggregating clinical evidence.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Research Synthesis Methods
20 papers in training set
Top 0.1%
42.8%
2
PLOS ONE
4510 papers in training set
Top 21%
8.9%
50% of probability mass above
3
Scientific Data
174 papers in training set
Top 0.2%
6.9%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.6%
4.7%
5
Nature Human Behaviour
85 papers in training set
Top 0.6%
4.5%
6
Scientific Reports
3102 papers in training set
Top 32%
3.9%
7
Journal of Clinical Epidemiology
28 papers in training set
Top 0.2%
3.1%
8
Journal of Biomedical Informatics
45 papers in training set
Top 0.6%
2.2%
9
npj Digital Medicine
97 papers in training set
Top 2%
1.9%
10
Nature Communications
4913 papers in training set
Top 49%
1.8%
11
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.4%
12
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
13
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.3%
1.0%
14
Database
51 papers in training set
Top 0.8%
0.8%
15
PLOS Biology
408 papers in training set
Top 18%
0.8%
16
BMC Medicine
163 papers in training set
Top 6%
0.8%
17
Epidemiology
26 papers in training set
Top 0.7%
0.5%
18
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.5%
19
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.5%
20
Bioinformatics
1061 papers in training set
Top 10%
0.5%
21
International Journal of Epidemiology
74 papers in training set
Top 3%
0.5%
22
eLife
5422 papers in training set
Top 62%
0.5%
23
Healthcare
16 papers in training set
Top 2%
0.5%