Back

Transportability of missing data models across study sites for research synthesis

Thiesmeier, R.; Madley-Dowd, P.; Ahlqvist, V.; Orsini, N.

2026-03-10 epidemiology
10.64898/2026.03.09.26347913 medRxiv
Show abstract

IntroductionSystematically missing covariates are a common challenge in medical research synthesis of quantitative data, particularly when individual participant data cannot be shared across study sites. Imputing covariate values in studies where they are systematically unobserved using information from sites where the covariate is observed implicitly assumes similarity of associations across studies. The behaviour of this assumption, and the bias arising from violating it, remains difficult to qualitatively reason about. Here, we evaluated a two-stage imputation approach for handling systematically missing covariates using simulations across a range of statistical and causal heterogeneity scenarios. MethodsWe conducted a simulation study with varying degrees of between-study heterogeneity and systematic differences in model parameters. A binary confounder was set to systematically missing in half of the studies. Study-specific effect estimates were combined using a two-stage meta-analytic model. The performance of the imputation approach was evaluated with the primary estimand being the pooled conditional confounding-adjusted exposure effect across all studies. ResultsBias in the pooled adjusted effect estimate was small across scenarios with low to substantial between-study heterogeneity. Bias increased monotonically with increasingly pronounced differences in causal structures across study sites. Coverage remained close to the nominal level under low to substantial between-study heterogeneity, but deteriorated markedly as differences in causal structures between study sites became more severe. ConclusionThe two-stage cross-site imputation approach produced valid pooled effect estimates across a wide range of simulated scenarios but showed monotonic sensitivity to differences in causal structures across studies. The results provide insight into the conditions under which cross-site imputation may be appropriate for handling systematically missing covariates in research synthesis.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Research Synthesis Methods
20 papers in training set
Top 0.1%
14.9%
2
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
14.9%
3
American Journal of Epidemiology
57 papers in training set
Top 0.1%
10.2%
4
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
10.2%
50% of probability mass above
5
International Journal of Epidemiology
74 papers in training set
Top 0.2%
8.5%
6
Epidemiology
26 papers in training set
Top 0.1%
8.5%
7
BMJ Open
554 papers in training set
Top 4%
4.3%
8
BMC Medicine
163 papers in training set
Top 1%
3.6%
9
European Journal of Epidemiology
40 papers in training set
Top 0.1%
3.6%
10
PLOS ONE
4510 papers in training set
Top 48%
2.1%
11
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.2%
1.5%
12
Systematic Reviews
11 papers in training set
Top 0.2%
1.5%
13
BMC Research Notes
29 papers in training set
Top 0.3%
1.1%
14
PLOS Biology
408 papers in training set
Top 18%
0.8%
15
Statistics in Medicine
34 papers in training set
Top 0.3%
0.8%
16
Trials
25 papers in training set
Top 2%
0.8%
17
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
18
Nature Communications
4913 papers in training set
Top 65%
0.6%
19
The Lancet Global Health
24 papers in training set
Top 1%
0.5%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.5%
21
PLOS Medicine
98 papers in training set
Top 6%
0.5%