Multiple imputation assuming missing at random: auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random

Curnow, E.; Cornish, R. P.; Heron, J.; Carpenter, J. R.; Tilling, K.

2023-10-17 epidemiology

10.1101/2023.10.17.23297137 medRxiv

Show abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). MI is valid (given correctly-specified models) if data are missing at random, conditional on the observed data, but not (unless additional information is available) if data are missing not at random (MNAR). In this paper we explore a previously-suggested strategy, namely, including an auxiliary variable predictive of missingness but not the missing data in the imputation model, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of additional bias of the MI estimator, over and above any bias due to data MNAR, from including such an auxiliary variable. We demonstrate that where missingness is caused by the outcome, additional bias can be substantial when the outcome is partially observed. Furthermore, if missingness is caused by the outcome and the exposure, additional bias can be even larger, when either the outcome or exposure is partially observed. When using MI, it is important to identify, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, the auxiliary variables most predictive of the missing data (in addition to all variables required for the analysis model and/or to minimise bias due to MNAR).

Multiple imputation assuming missing at random: auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random

Matching journals