Propensity-score matching with GAN-generated observations from electronic health records: simulation study and application to the evaluation of prone positioning in COVID-19 patients under mechanical ventilation
Bouvarel, B.; Glemain, B.; Carrat, F.; Lapidus, N.
Show abstract
BackgroundPropensity score (PS) methods are widely used in observational studies to estimate causal effects, but they often exclude patients due to a lack of comparable counterparts, leading to reduced power and potential bias. Generative adversarial networks (GANs) have shown promise in creating synthetic data, but their application to causal inference remains underexplored. Synthetic data could be used as plausible counterfactuals, potentially mitigating the issues of the PS methods. This study evaluates the integration of GAN-generated synthetic observations into propensity score matching (PSM) to improve the emulation of RCTs, using both simulated and real-world electronic health record (EHR) data. MethodsA simulation study was conducted using with predefined confounding structures to compare traditional PSM against two hybrid approaches incorporating GAN-generated synthetic patients to partially or fully match the original sample of patients. Treatment effects were estimated via logistic regression, and performance was assessed by bias, standard error, alpha risk, power, and confidence interval coverage. The methods were applied to a real-world dataset of mechanically ventilated COVID-19 patients to evaluate the impact of early prone positioning on 28-day mortality. ResultsIn simulations, GAN-generated patients permitted to match all patients in the original sample, whereas PSM dropped up to 60% of them. While synthetic augmentation improved sample size, unadjusted use of synthetic matches led to underestimated standard errors and inflated type I error. Down-weighting matched synthetic data improved error control but did not consistently outperform PSM in bias or power. In the real-world application (n=1399), treatment effect estimates for prone positioning were similar across all methods and did not reach statistical significance. ConclusionGAN-augmented propensity score matching can reduce sample loss. However, its current application in causal inference through PS matching remains limited. Synthetic data do not contribute independent information and must be cautiously integrated to avoid misleading precision. While promising, current GAN implementations require methodological refinements before routine use in causal inference.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.