Back

A plasmode simulation-based bias analysis for residual confounding by unmeasured variables leveraging information-rich subsets

Desai, R. J.; Wang, S.; Pillai, H. S.; Mahesri, M.; Gu, B.; Lii, J.; Dutcher, S. K.; Jones, C.; Shebl, F. M.; Bradley, M. C.; Hua, W.; Lee, H.; Dal Pan, G. J.; Ball, R.; Schneeweiss, S. S.

2025-10-31 epidemiology
10.1101/2025.10.28.25338968 medRxiv
Show abstract

BackgroundQuantitative bias analyses often rely on unrealistic assumptions and do not fully reflect the complexities of healthcare data. MethodsWe describe a plasmode simulation-based bias analysis for residual confounding from unmeasured variables by leveraging granular information from a subset of cohort members. We generated 500 simulated cohorts based on individual-level claims and linked electronic health record (EHR) data identifying new users of varenicline and bupropion from the Mass General Brigham site of the FDA Sentinel Real World Evidence Data Enterprise. Two adverse outcomes were simulated: 1) neuropsychiatric hospitalizations and 2) major adverse cardiovascular events (MACE), and measured confounding factors, identified from information available in claims including demographics, comorbid conditions, and comedications, were tailored to each outcome. Residual confounding was simulated using potential confounders measured in EHRs but unmeasured in claims including suicidal ideation for the neuropsychiatric outcomes and body mass index (BMI), blood pressure (BP), and smoking pack-years for the MACE outcome. These simulations retained the correlation between claims and EHR-based confounders observed in empirical data for realistic reflection of proxy adjustment of unmeasured confounders. Analyses were conducted in simulated data with and without adjustment for the EHR-based covariates to evaluate the extent of residual confounding in claims-only analyses. ResultsAfter 500 simulations, the median absolute standardized mean difference (ASMD) between treatment groups in the unadjusted sample was 0.16 for suicidal ideation; while <0.1 for BMI, BP, and smoking pack-years. For both outcomes, adjustment using claims-based variables provided relative bias close to 0, leading to the conclusion that EHR-measured confounders that were unmeasured in claims were unlikely to result in strong residual confounding within realistic simulations informed by empirical data. ConclusionThe proposed approach provides a method for quantifying bias in non-randomized studies threatened by unavailability of potentially important confounding variables. Key pointsO_LIResidual confounding by unmeasured factors is a central threat in pharmacoepidemiology that is almost always acknowledged in published studies but seldom quantified. C_LIO_LIWe describe a plasmode-simulation based approach to systematically design quantitative bias analyses that reflect the complexities of routinely collected healthcare data by leveraging detailed electronic health records from a subset. C_LIO_LIWe provide open-source software code to enable other researchers to adopt this method in future studies and improve the reliability of their findings. C_LI Plain language summaryThis study introduces a new way for researchers to better understand and measure bias caused by missing health information in large insurance databases. Using detailed hospital records alongside insurance claims data, we created realistic computer simulations to test how much of the observed risk in safety studies could be explained away by missing important health factors, like depression or smoking habits, that arent always recorded in insurance data. The approach is flexible, uses real patient data, and helps researchers make stronger, more reliable conclusions about risks and benefits of treatments, even when some patient information is not available in all records.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
28.3%
2
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.1%
12.6%
3
Epidemiology
26 papers in training set
Top 0.1%
10.3%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 24%
7.0%
5
BMJ Open
554 papers in training set
Top 4%
5.0%
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.3%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.1%
8
American Journal of Epidemiology
57 papers in training set
Top 0.4%
2.7%
9
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
2.1%
10
Research Synthesis Methods
20 papers in training set
Top 0.1%
1.9%
11
Journal of Clinical Epidemiology
28 papers in training set
Top 0.2%
1.8%
12
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
13
BMC Medicine
163 papers in training set
Top 3%
1.7%
14
Psychological Medicine
74 papers in training set
Top 1%
1.4%
15
Statistics in Medicine
34 papers in training set
Top 0.3%
0.9%
16
JAMA Network Open
127 papers in training set
Top 4%
0.8%
17
Frontiers in Pharmacology
100 papers in training set
Top 4%
0.8%
18
PLOS Medicine
98 papers in training set
Top 5%
0.7%
19
PLOS Biology
408 papers in training set
Top 20%
0.7%
20
International Journal of Epidemiology
74 papers in training set
Top 3%
0.7%
21
Medical Decision Making
10 papers in training set
Top 0.4%
0.5%
22
European Journal of Epidemiology
40 papers in training set
Top 1.0%
0.5%
23
Nature Communications
4913 papers in training set
Top 67%
0.5%