Breaking the Extraction Bottleneck: A Single AI Agent Achieves Statistical Equivalence with Human-Extracted Meta-Analysis Data Across Five Agricultural Datasets
Halpern, M.
Show abstract
BackgroundData extraction is the primary bottleneck in meta-analysis, consuming weeks of researcher time with single-extractor error rates of 17.7%. Existing LLM-based systems achieve only 26-36% accuracy on continuous outcomes, and no study has validated AI-extracted continuous data against multiple independent datasets using formal equivalence testing. MethodsA single AI agent (Claude Opus 4.6) extracted treatment means, control means, sample sizes, and variance measures from source PDFs across five published agricultural meta-analyses spanning zinc biofortification, biostimulant efficacy, biochar amendments, predator biocontrol, and elevated CO2 effects on plant mineral nutrition. Observations were matched to reference standards using an LLM-driven alignment method. Validation employed proportional TOST equivalence testing, ICC(3,1), Bland-Altman analysis, and source-type stratification. ResultsAcross five datasets, the agent produced 1,149 matched observations from 136 papers. Pearson correlations ranged from 0.984 to 0.999. Proportional TOST confirmed statistical equivalence for all five datasets (all p < 0.05). Table-sourced observations achieved 5.5x lower median error than figure-sourced observations. Aggregate effects were reproduced within 0.01-1.61 pp of published values. Independent duplicate runs confirmed extraction stability (within 0.09-0.23 pp). ConclusionsA single AI agent achieves statistical equivalence with human-extracted meta-analysis data across five independent agricultural datasets. The approach reduces extraction cost by approximately one to two orders of magnitude while maintaining accuracy sufficient for aggregate meta-analytic pooling. HighlightsO_ST_ABSWhat is already knownC_ST_ABSO_LIData extraction is the primary bottleneck in meta-analysis, with single-extractor error rates of 17.7% C_LIO_LIExisting LLM-based extraction systems achieve only 26-36% accuracy on continuous outcomes C_LIO_LINo study has validated AI extraction against multiple independent datasets using formal equivalence testing C_LI What is newO_LIA single AI agent achieves statistical equivalence with human-extracted data across five agricultural meta-analyses (1,149 observations, 136 papers) C_LIO_LILLM-driven alignment resolves the previously underappreciated bottleneck of moderator matching, improving correlations from 0.377-0.812 to 0.984-0.997 without changing extracted values C_LIO_LITable-sourced observations achieve 5.5x lower error than figure-sourced data C_LI Potential impact for RSM readersO_LIProvides a validated, reproducible workflow for AI-assisted data extraction in meta-analysis C_LIO_LIDemonstrates that most apparent "extraction error" in validation studies is actually alignment error C_LIO_LIOffers practical quality signals (source-type labeling) for downstream meta-analysts C_LI
Matching journals
The top 6 journals account for 50% of the predicted probability mass.