Back

Multiple imputation assuming missing at random: auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random

Curnow, E.; Cornish, R. P.; Heron, J.; Carpenter, J. R.; Tilling, K.

2023-10-17 epidemiology
10.1101/2023.10.17.23297137 medRxiv
Show abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). MI is valid (given correctly-specified models) if data are missing at random, conditional on the observed data, but not (unless additional information is available) if data are missing not at random (MNAR). In this paper we explore a previously-suggested strategy, namely, including an auxiliary variable predictive of missingness but not the missing data in the imputation model, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of additional bias of the MI estimator, over and above any bias due to data MNAR, from including such an auxiliary variable. We demonstrate that where missingness is caused by the outcome, additional bias can be substantial when the outcome is partially observed. Furthermore, if missingness is caused by the outcome and the exposure, additional bias can be even larger, when either the outcome or exposure is partially observed. When using MI, it is important to identify, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, the auxiliary variables most predictive of the missing data (in addition to all variables required for the analysis model and/or to minimise bias due to MNAR).

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Epidemiology
26 papers in training set
Top 0.1%
18.7%
2
Statistics in Medicine
34 papers in training set
Top 0.1%
17.6%
3
American Journal of Epidemiology
57 papers in training set
Top 0.1%
10.5%
4
International Journal of Epidemiology
74 papers in training set
Top 0.1%
10.1%
50% of probability mass above
5
Epidemics
104 papers in training set
Top 0.2%
6.3%
6
BMC Medical Research Methodology
43 papers in training set
Top 0.2%
4.3%
7
Epidemiology and Infection
84 papers in training set
Top 0.5%
3.6%
8
Genetic Epidemiology
46 papers in training set
Top 0.2%
3.1%
9
PLOS ONE
4510 papers in training set
Top 46%
2.4%
10
European Journal of Epidemiology
40 papers in training set
Top 0.3%
1.9%
11
Journal of The Royal Society Interface
189 papers in training set
Top 2%
1.9%
12
Scientific Reports
3102 papers in training set
Top 55%
1.8%
13
PLOS Genetics
756 papers in training set
Top 9%
1.7%
14
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
15
International Journal of Infectious Diseases
126 papers in training set
Top 3%
0.9%
16
Nature Communications
4913 papers in training set
Top 61%
0.8%
17
eLife
5422 papers in training set
Top 55%
0.8%
18
Biology
43 papers in training set
Top 3%
0.6%
19
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.6%
20
Methods in Ecology and Evolution
160 papers in training set
Top 3%
0.6%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 47%
0.6%