Back

Accuracy and efficiency of using artificial intelligence for data extraction in systematic reviews. A noninferiority study within reviews

Lee, D. C. W.; O'Brien, K. M.; Presseau, J.; Yoong, S.; Lecathelinais, C.; Wolfenden, L.; Thomas, J.; Arno, A.; Hutton, B.; Hodder, R. K.

2026-02-27 public and global health

10.64898/2026.02.25.26347053 medRxiv

Show abstract

BackgroundSystematic reviews are important for informing public health policies and program selection; however, they are time- and resource-intensive. Artificial intelligence (AI) offers a solution to reduce these labour-intensive requirements for various aspects of systematic review production, including data extraction. To date, there is limited robust evidence evaluating the accuracy and efficiency of AI for data extraction. This study within a review (SWAR) aimed to determine whether human data extraction assisted by an AI research assistant (Elicit(R)) is noninferior to human-only data extraction in terms of accuracy (i.e. agreement) and time-to-completion. Secondary aims included comparing error types and costs. MethodsA two-arm noninferiority SWAR was conducted to compare AI-assisted and human-only data extraction from 50 RCTs chronic disease interventions. Participants were randomised to extract all data required for conducting a review, using either the AI-assisted or human-only method. Accuracy was assessed using a three-point rubric by an independent assessor blinded to group allocation, based on agreement between extracted data and the assessor. Accuracy scores were standardized to a 0-100 scale. Analysis included overall and subgroup accuracy (data group and data type) using paired t-tests. Time-to-completion was self-reported by data extractors. Type of errors were coded by type and severity, and costs were calculated for data extraction, preparation of files, training and the Elicit(R) Pro subscription. ResultsThere was no difference in overall accuracy between the AI-assisted and human-only arms (mean difference (MD) 0.57 (on a 0-100 scale), 95% confidence interval (CI) -1.29, 2.43). Subgroup analysis by data group found AI-assisted to be more accurate than human-only data extraction for data variables describing intervention and control group (MD 4.75, 95% CI 2.13, 7.38), but otherwise no subgroup differences were observed. AI-assisted data extraction was significantly faster (MD 24.82 mins, 95% CI 18.80, 30.84). The AI-assisted arm made similar error types (missed or omitted data: AI-assisted 3.6%, human-only 3.4%) and severity (minor errors: AI-assisted 6.7%, human-only 6.5%) and cost $181.98 less than the human-only data extraction across the 50 studies. ConclusionAI-assisted data extraction using Elicit(R) showed noninferior accuracy, faster completion times, similar error types and severity, and lower costs compared to human-only extraction. These efficiency gains, without loss in accuracy suggest AI-assisted data extraction can replace one human-only data extractor in future systematic reviews of RCTs. Future research should explore different models of AI data extraction such as two AI-assisted extractors or AI-only extractor with human-only extractor, and comparison of AI-assisted to AI-only.

Accuracy and efficiency of using artificial intelligence for data extraction in systematic reviews. A noninferiority study within reviews

Matching journals