Fast clinical trial identification using fuzzy-search elastic searches: retrospective validation with high-quality Cochrane benchmark
Otte, W. M.; van IJzendoorn, D. G.; Habets, P. C.; Vinkers, C. H.
Show abstract
The synthesis of treatment effects relies on systematic reviews of intervention trials. This process is often laborious due to the need for precise search queries and manual study identification. Recent advancements in database architecture and natural language processing (NLP) offer a potential solution by enabling faster, more flexible searches and automated extraction of information from unstructured texts. Our study assesses the effectiveness of NLP-based literature searches within a novel database structure in comparison to the Cochrane Database of Systematic Reviews. We created a user-friendly elastic search database containing 36 million PubMed-indexed entries. We developed reliable filters for identifying randomized clinical trials and clinical intervention studies, as well as extracting relevant subtext related to population and intervention. Our results indicate a high precision of 0.74, recall of 0.81, and F1-score of 0.77 for population subtext, and a precision of 0.70, recall of 0.71, and an F1-score of 0.70 for intervention subtext. Our approach efficiently identified included studies in 90% of systematic reviews, missing no more than two trials compared to Cochrane. Furthermore, it produced fewer total hits than a comparable PubMed keyword search, demonstrating the potential of the new database structure to enhance the efficiency and effectiveness of aggregating clinical evidence.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.