Back

A plasmode simulation-based bias analysis for residual confounding by unmeasured variables leveraging information-rich subsets

Desai, R. J.; Wang, S.; Pillai, H. S.; Mahesri, M.; Gu, B.; Lii, J.; Dutcher, S. K.; Jones, C.; Shebl, F. M.; Bradley, M. C.; Hua, W.; Lee, H.; Dal Pan, G. J.; Ball, R.; Schneeweiss, S. S.

2025-10-31 epidemiology

10.1101/2025.10.28.25338968 medRxiv

Show abstract

BackgroundQuantitative bias analyses often rely on unrealistic assumptions and do not fully reflect the complexities of healthcare data. MethodsWe describe a plasmode simulation-based bias analysis for residual confounding from unmeasured variables by leveraging granular information from a subset of cohort members. We generated 500 simulated cohorts based on individual-level claims and linked electronic health record (EHR) data identifying new users of varenicline and bupropion from the Mass General Brigham site of the FDA Sentinel Real World Evidence Data Enterprise. Two adverse outcomes were simulated: 1) neuropsychiatric hospitalizations and 2) major adverse cardiovascular events (MACE), and measured confounding factors, identified from information available in claims including demographics, comorbid conditions, and comedications, were tailored to each outcome. Residual confounding was simulated using potential confounders measured in EHRs but unmeasured in claims including suicidal ideation for the neuropsychiatric outcomes and body mass index (BMI), blood pressure (BP), and smoking pack-years for the MACE outcome. These simulations retained the correlation between claims and EHR-based confounders observed in empirical data for realistic reflection of proxy adjustment of unmeasured confounders. Analyses were conducted in simulated data with and without adjustment for the EHR-based covariates to evaluate the extent of residual confounding in claims-only analyses. ResultsAfter 500 simulations, the median absolute standardized mean difference (ASMD) between treatment groups in the unadjusted sample was 0.16 for suicidal ideation; while <0.1 for BMI, BP, and smoking pack-years. For both outcomes, adjustment using claims-based variables provided relative bias close to 0, leading to the conclusion that EHR-measured confounders that were unmeasured in claims were unlikely to result in strong residual confounding within realistic simulations informed by empirical data. ConclusionThe proposed approach provides a method for quantifying bias in non-randomized studies threatened by unavailability of potentially important confounding variables. Key pointsO_LIResidual confounding by unmeasured factors is a central threat in pharmacoepidemiology that is almost always acknowledged in published studies but seldom quantified. C_LIO_LIWe describe a plasmode-simulation based approach to systematically design quantitative bias analyses that reflect the complexities of routinely collected healthcare data by leveraging detailed electronic health records from a subset. C_LIO_LIWe provide open-source software code to enable other researchers to adopt this method in future studies and improve the reliability of their findings. C_LI Plain language summaryThis study introduces a new way for researchers to better understand and measure bias caused by missing health information in large insurance databases. Using detailed hospital records alongside insurance claims data, we created realistic computer simulations to test how much of the observed risk in safety studies could be explained away by missing important health factors, like depression or smoking habits, that arent always recorded in insurance data. The approach is flexible, uses real patient data, and helps researchers make stronger, more reliable conclusions about risks and benefits of treatments, even when some patient information is not available in all records.

A plasmode simulation-based bias analysis for residual confounding by unmeasured variables leveraging information-rich subsets

Matching journals