Back

Validation of an AI-Assisted Framework for Systematic Bias Assessment in Observational Studies

Etminan, M.; Rezaeianzadeh, R.; Douros, A.

2026-04-28 epidemiology
10.64898/2026.04.26.26351778 medRxiv
Show abstract

BackgroundThe rapid expansion of medical literature has led to substantial variability and frequent contradictions in study findings, making it increasingly difficult to distinguish meaningful signals from noise. Much of this variability arises from differences in study methodology, where biases such as confounding, selection bias, and reverse causation can drive spurious associations. While artificial intelligence (AI)-assisted tools have been developed to support risk-of-bias assessment, most are designed for systematic reviews and are not tailored to identifying specific epidemiologic biases in observational studies. This highlights the need for structured, scalable approaches to evaluate study validity in real-world evidence. ObjectiveTo develop and validate an AI-assisted, expert-informed, rule-based framework (EpiVise) for systematically identifying and classifying key sources of bias in pharmacoepidemiologic studies, and to assess its agreement with expert evaluation. MethodsWe conducted a validation study using recently published pharmacoepidemiologic studies from high-impact journals (post-2025). Each study was independently assessed by the framework and two expert epidemiologists, across predefined bias domains, including measured confounding, confounding by indication, selection bias, immortal time bias, and disease latency. Agreement was evaluated using weighted kappa statistics. In the absence of a gold standard, expert judgment served as the reference benchmark. In a second phase, synthetic study scenarios with predefined embedded biases were constructed to assess the frameworks ability to detect known bias structures under controlled conditions. ResultsIn analyses of published studies (10 studies; 60 ratings), agreement between the framework and expert assessments was substantial ({kappa} = 0.75; 95% confidence interval [CI], 0.60-0.86), with 12 discordant ratings (20.0%), all limited to adjacent categories and occurring primarily in the confounding by indication and selection bias domains. In synthetic study scenarios (10 studies; 50 ratings), agreement was similarly substantial, with 42 of 50 ratings concordant (84%) and a weighted kappa of 0.77 (95% CI, 0.67-0.87); discordances included both adjacent-category and extreme disagreements and were concentrated in confounding by indication, selection bias, and prevalent user bias domains. ConclusionsThis AI-assisted, expert-informed framework, EpiVise provides a scalable and reproducible approach for evaluating epidemiologic study validity, substantial demonstrating agreement comparable to expert assessment. By systematically identifying key sources of bias, the framework has the potential to enhance the rigor and consistency of evidence evaluation, support peer review, and inform clinical, regulatory, and policy decision-making. Further validation across broader study designs and domains is warranted.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
18.4%
2
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
12.2%
3
Research Synthesis Methods
20 papers in training set
Top 0.1%
8.3%
4
American Journal of Epidemiology
57 papers in training set
Top 0.1%
8.3%
5
npj Digital Medicine
97 papers in training set
Top 0.7%
7.1%
50% of probability mass above
6
European Journal of Epidemiology
40 papers in training set
Top 0.1%
4.1%
7
PLOS ONE
4510 papers in training set
Top 36%
3.9%
8
Epidemiology
26 papers in training set
Top 0.1%
3.6%
9
International Journal of Epidemiology
74 papers in training set
Top 0.6%
3.6%
10
BMC Medicine
163 papers in training set
Top 2%
3.6%
11
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.1%
2.6%
12
PLOS Biology
408 papers in training set
Top 7%
2.1%
13
Nature Communications
4913 papers in training set
Top 48%
2.1%
14
Nature Human Behaviour
85 papers in training set
Top 2%
2.1%
15
BMJ Open
554 papers in training set
Top 9%
1.7%
16
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.5%
17
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.5%
18
PLOS Medicine
98 papers in training set
Top 3%
1.3%
19
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
20
Scientific Reports
3102 papers in training set
Top 76%
0.7%
21
The Lancet Digital Health
25 papers in training set
Top 1%
0.6%