Back

Benchmarking foundation models for improving confounding control in target trial emulation

Kleper, S. L.; Melamed, R. D.

2026-05-13 epidemiology
10.64898/2026.05.09.26352820 medRxiv
Show abstract

Machine learning models for causal inference aim to adjust for confounding factors that are associated with both an exposure and an outcome, creating a spurious biased association. But, these methods are rarely empirically evaluated to assess their success in mitigating such bias. Recent advances in knowledge representation, including both foundation models and knowledge graphs, could enrich these models, but rigorous evaluations are needed in order to assess their potential. Here, we ask whether enriching existing causal inference models with knowledge representations from foundation models can improve confounding control. Rather than using semi-simulated data to address this question, we focus on examples of real confounding: we emulate target randomized active comparator trials that are subject to confounding by indication. Our results can guide researchers aiming to develop or apply methods for discovering causal effects from observational data.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
19.0%
2
PLOS Computational Biology
1633 papers in training set
Top 4%
8.4%
3
Research Synthesis Methods
20 papers in training set
Top 0.1%
7.3%
4
Epidemiology
26 papers in training set
Top 0.1%
6.5%
5
eLife
5422 papers in training set
Top 16%
4.9%
6
Nature Human Behaviour
85 papers in training set
Top 0.5%
4.9%
50% of probability mass above
7
Nature Communications
4913 papers in training set
Top 35%
4.4%
8
PLOS ONE
4510 papers in training set
Top 33%
4.4%
9
Scientific Reports
3102 papers in training set
Top 34%
3.7%
10
Statistics in Medicine
34 papers in training set
Top 0.1%
3.1%
11
BMC Medicine
163 papers in training set
Top 2%
2.1%
12
International Journal of Epidemiology
74 papers in training set
Top 1.0%
2.1%
13
npj Digital Medicine
97 papers in training set
Top 2%
2.1%
14
PLOS Biology
408 papers in training set
Top 6%
2.1%
15
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.2%
1.7%
16
Journal of Biomedical Informatics
45 papers in training set
Top 0.8%
1.7%
17
European Journal of Epidemiology
40 papers in training set
Top 0.4%
1.5%
18
Scientific Data
174 papers in training set
Top 1%
1.5%
19
Medical Decision Making
10 papers in training set
Top 0.2%
1.2%
20
American Journal of Epidemiology
57 papers in training set
Top 1%
0.9%
21
Journal of Clinical Epidemiology
28 papers in training set
Top 0.5%
0.9%
22
Epidemiology and Infection
84 papers in training set
Top 3%
0.8%
23
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
24
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
25
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
26
Trials
25 papers in training set
Top 2%
0.7%
27
Biometrics
22 papers in training set
Top 0.2%
0.7%
28
Epidemics
104 papers in training set
Top 2%
0.7%
29
BMC Cancer
52 papers in training set
Top 3%
0.5%
30
JAMIA Open
37 papers in training set
Top 2%
0.5%