Epidemiology
○ Ovid Technologies (Wolters Kluwer Health)
All preprints, ranked by how well they match Epidemiology's content profile, based on 26 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Curnow, E.; Tilling, K.; Heron, J.; Cornish, R. P.; Carpenter, J. R.
Show abstract
Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly-chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e. it is a "collider"), its inclusion can induce bias in the MI estimator and may increase SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome are incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders. Contribution to the field statementIn multiple imputation (MI), in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. We examine the consequences of a poorly-chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e. it is a "collider"), its inclusion can induce bias in the MI estimator and may increase SE. We demonstrate that when the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. We recommmend a combination of data exploration and consideration of plausible casual diagrams and missingness mechanisms to examine whether potential auxiliary variables are colliders.
Knight, J.; Wang, S.; Mishra, S.
Show abstract
BackgroundTwo required inputs to mathematical models of sexually transmitted infections are the average duration in epidemiological risk states (e.g., selling sex) and the average rates of sexual partnership change. These variables are often only available as aggregate estimates from published cross-sectional studies, and may be subject to distributional, sampling, censoring, and measurement biases. MethodsWe explore adjustments for these biases using aggregate estimates of duration in sex work and numbers of reported sexual partners from a published 2011 survey of female sex worker in Eswatini. We develop adjustments from first principles, and construct Bayesian hierarchical models to reflect our mechanistic assumptions about the bias-generating processes. ResultsWe show that different mechanisms of bias for duration in sex work may "cancel out" by acting in opposite directions, but that failure to consider some mechanisms could over- or underestimate duration in sex work by factors approaching 2. We also show that conventional interpretations of sexual partner numbers are biased due to implicit assumptions about partnership duration, but that unbiased estimators of partnership change rate can be defined that explicitly incorporate a given partnership duration. We highlight how the unbiased estimator is most important when the survey recall period and partnership duration are similar in length. ConclusionsWhile we explore these bias adjustments using a particular dataset, and in the context of deriving inputs for mathematical modelling, we expect that our approach and insights would be applicable to other datasets and motivations for quantifying sexual behaviour data.
Hazewinkel, A.-D.; Tilling, K.; Wade, K. H.; Palmer, T. M.
Show abstract
Randomized controlled trials (RCTs) are considered the gold standard for assessing the causal effect of an exposure on an outcome, but are vulnerable to bias from missing data. When outcomes are missing not at random (MNAR), estimates from complete case analysis (CCA) will be biased. There is no statistical test for distinguishing between outcomes missing at random (MAR) and MNAR, and current strategies rely on comparing dropout proportions and covariate distributions, and using auxiliary information to assess the likelihood of dropout being associated with the outcome. We propose using the observed variance difference across treatment groups as a tool for assessing the risk of dropout being MNAR. In an RCT, at randomization, the distributions of all covariates should be equal in the populations randomized to the intervention and control arms. Under the assumption of homogeneous treatment effects, the variance of the outcome will also be equal in the two populations over the course of followup. We show that under MAR dropout, the observed outcome variances, conditional on the variables included in the model, are equal across groups, while MNAR dropout may result in unequal variances. Consequently, unequal observed conditional group variances are an indicator of MNAR dropout and possible bias of the estimated treatment effect. Heterogeneity of treatment effect affects the intervention group variance, and is another potential cause of observing different outcome variances. We show that, for longitudinal data, we can isolate the effect of MNAR outcome-dependent dropout by considering the variance difference at baseline in the same set of patients that are observed at final follow-up. We illustrate our method in simulation and in applications using individual-level patient data and summary data.
Hamilton, F. W.; Lee, T. C.; Butler-Laporte, G.
Show abstract
Instrumental variable (IV) analysis is a widely used technique in econometrics to estimate causal effects in the presence of confounding. A recent application of this technique was used in a high-profile analysis in JAMA Internal Medicine to estimate the effect of cefepime, a broad-spectrum antibiotic, on mortality in severe infection. There has been ongoing concern that piperacillin-tazobactam, another broad-spectrum antibiotic with greater anaerobic activity might be inferior to cefepime, however this has not been shown in randomized controlled trials. The authors used an international shortage of piperacillin-tazobactam as an instrument, as during this shortage period, cefepime was used as an alternative. The authors report a strong mortality effect (5% absolute increase) with piperacillin-tazobactam. In this paper, we closely examine this estimate and find it is likely conditional on inclusion of a control variable (metronidazole usage). Inclusion of this variable is highly likely to lead to collider bias, which we show via simulation. We then generate estimates unadjusted for metronidazole which are much closer to the null and may represent residual confounding or confounding by indication. We highlight the ongoing challenge of collider bias in empirical IV analyses and the potential for large biases to occur. We finally suggest the authors consider including these unadjusted estimates in their manuscript, as the large increase in mortality reported with piperacillin-tazobactam is unlikely to be true.
Hazewinkel, A.-D.; Gregson, J.; Bartlett, J. W.; Gasparyan, S. B.; Wright, D.; Pocock, S.
Show abstract
Objectives: Introducing a new covariate adjustment method for hierarchical outcomes using ordinal logistic regression, comparing it with existing approaches, and assessing whether adjustment improves power in randomized trials with hierarchical outcomes. Methods: We developed an ordinal regression-based method for covariate adjustment of the win ratio and compared it with three alternatives: probability index models, inverse probability weighting, and a randomization-based estimator. Methods were applied to the EMPEROR-Preserved rial and tested through extensive simulations involving two common hierarchical outcome structures: time-to-event composites, and composites combining time-to-event with quantitative measures. Simulations assessed impacts on estimates, standard errors, and power across prognostic and non-prognostic settings. Results: In RCT data and simulations, covariate adjustment consistently increased power when adjusting for prognostic baseline variables. Gains were comparable to or greater than those in conventional Cox models, with no power loss for non-prognostic covariates. Our ordinal approach performed similarly to existing methods while providing interpretable covariate effect estimates. Adjusting for baseline values of quantitative components yielded power gains according to the baseline-to-follow-up correlation. Conclusions: Covariate adjustment for prognostic variables meaningfully improves efficiency in win ratio analyses for hierarchical outcomes. Our ordinal method is easily implemented and facilitates covariate effect interpretation. We recommend the broader adoption of covariate adjustment and our ordinal method in randomized trials using hierarchical outcomes.
Hripcsak, G.; Anand, T.; Chen, H. Y.; Zhang, L.; Chen, Y.; Suchard, M. A.; Ryan, P. B.; Schuemie, M. J.
Show abstract
Propensity score adjustment is commonly used in observational research to address confounding. Controversy persists about how to select covariates as possible confounders to generate the propensity model. A desire to include all possible confounders is offset by a concern that more covariates will augment bias or increase variance. Much of concern is over instruments, which are variables that affect the treatment but not the outcome. Adjusting for an instrument has been shown to increase bias due to unadjusted confounding and to increase the variance of the effect estimate. Large-scale propensity score (LSPS) adjustment includes most available pre-treatment covariates in its propensity model. It addresses instruments with a pair of diagnostics, ceasing the analysis if any covariate exceeds a correlation coefficient of 0.5 with the treatment and checking for an aggregation of instruments with equipoise reported as a preference score. Our simulation assesses the impact of adjusting for instruments in the context of LSPSs diagnostics. In our simulation, even when the variance of the treatment contributed by the adjusted instrument(s) exceeds an unadjusted confounder by over twenty-fold, when the correlation between the instrument(s) and the treatment was less than 0.5 and the equipoise was greater than 0.5, the additional shift in the effect estimate due to adjusting for the instrument(s) was less than the shift due to confounding by itself. Therefore, we find in this simulation that adjusting for instruments contributed a minor amount of bias to the effect estimate. This simulation aligns well with a previous assessment of the impact of adjusting for instruments and with separate empirical evidence that adjusting for many covariates surpasses attempts to identify a limited set of confounders.
Liu, C.; Mayer, M.; Lactaoen, K.; Gomez, L.; Weissman, G.; Hubbard, R.
Show abstract
Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-exchangeability between internal and external control patients. To address this challenge, we developed a sensitivity analysis framework to assess the robustness of HCT results to potential unmeasured confounding. We propose a tipping point analysis that adapts the E-value framework to the HCT setting where trial participation rather than treatment assignment is subject to confounding. To aid interpretation, we also introduce a data-driven benchmark representing the strength of unmeasured confounding reflected by the observed outcome non-exchangeability. We then propose an operational decision rule and evaluate its performance through simulation studies. Finally, we illustrate the approach using an asthma trial augmented by data from electronic health records. Simulation results demonstrate that our decision rule safeguards against Type I error inflation while preserving the power gains achieved by incorporating external data. In settings where moderate unmeasured confounding led to poorer outcomes for external controls, Type I error was controlled near the nominal 5% level, and power increased by 10-20% compared with analyses using RCT data alone. Our approach provides a practical, interpretable method to assess HCT robustness, supporting rigorous inference when integrating external real-world data.
Brown, J. P.; Yland, J. J.; Williams, P. L.; Huybrechts, K. F.; Hernandez-Diaz, S.
Show abstract
The analysis of perinatal studies is complicated by twins and other multiple births even when they are not the exposure, outcome, or a confounder of interest. Common approaches to handling multiples in studies of infant outcomes include restriction to singletons, counting outcomes at the pregnancy-level (i.e., by counting if at least one twin experienced a binary outcome), or infant-level analysis including all infants and, typically, accounting for clustering of outcomes by using generalised estimating equations or mixed effects models. Several healthcare administration databases only support restriction to singletons or pregnancy-level approaches. For example, in MarketScan insurance claims data, diagnoses in twins are often assigned to a single infant identifier, thereby preventing ascertainment of infant-level outcomes among multiples. Different approaches correspond to different causal questions, produce different estimands, and often rely on different assumptions. We demonstrate the differences that can arise from these different approaches using Monte Carlo simulations, algebraic formulas, and an applied example. Furthermore, we provide guidance on the handling of multiples in perinatal studies when using healthcare administration data.
Smith, E. G.; Netherton, D. M.
Show abstract
Confounding is one of the most important concerns for randomized or nonrandomized intervention or exposure studies. This manuscript describes several metrics intended to provide quantitative approximations of confounding under certain conditions. Each metric quantifies differences in risk between intervention arms during time periods when the intervention (or exposure of interest) is not occurring. Because exposure is absent, these metrics have the potential to summarize the effects of other measured and unmeasured factors on outcome risk. A null association (e.g., a risk, rate, or hazard ratio of approximately 1.0 or a risk difference of 0.0) between intervention arms during nonexposure can suggest equal impacts from baseline factors affecting each study arm (i.e., an absence of confounding). However, other factors such as attrition bias, postexposure effects, or incomplete representation of the full cohort can also affect risks during nonexposure, causing the nonexposure risk to represent confounding less accurately. We propose four nonexposure metrics designed to limit these other influences on nonexposure risk, thus providing nonexposure risks that more exclusively reflect confounding. The metrics, however, vary in their potential to limit these other influences and also vary in their sensitivity to random error. We then demonstrate what we expect to be the most widely useful metric currently, the "briefly-exposed postexposure (bePE) risk metric." We show how the bePE risk metric can inform multiple aspects of a real-world study, such as cohort derivation and interpretation of findings. Definitive validation of nonexposure risk metrics awaits further research. Nevertheless, these metrics have the potential to substantially improve intervention and exposure studies by approximating confounding under certain conditions. Their testing and validation should be a research priority.
Sondhi, A.; Humblet, O.; Swaminathan, A.
Show abstract
In real world data (RWD) studies, observed datasets are often subject to left truncation, which can bias estimates of survival parameters. Standard methods can only suitably account for left truncation when survival and entry time are independent. Therefore, in the dependent left truncation setting, it is important to quantify the magnitude and direction of estimator bias to determine whether an analysis provides valid results. We conduct simulation studies of common RWD analytic settings in order to determine when standard analysis provides reliable estimates, and to identify factors that contribute most to estimator bias. We also outline a procedure for conducting a simulation-based sensitivity analysis for an arbitrary dataset subject to dependent left truncation. Our simulation results show that when comparing a truncated real-world arm to a non-truncated arm, we observe the estimated hazard ratio biased upwards, providing conservative inference. The most important data-generating parameter contributing to bias is the proportion of left truncated patients, given any level of dependence between survival and entry time. For specific datasets and analyses that may differ from our example, we recommend applying our sensitivity analysis approach to determine how results would change given varying proportions of truncation.
Mwakazanga, D. K.; daka, v.; Gwasupika, J. K.; Dombola, A. K.; Kapungu, K. K.; Khondowe, S.; Chongwe, G. K.; Fwemba, I.; Ogundimu, E.
Show abstract
Medical male circumcision (MMC) is an established HIV prevention intervention, yet concerns persist that circumcised men may adopt higher-risk sexual behaviours following the procedure. Evidence from observational studies has been inconsistent, partly because many analyses do not adequately distinguish behaviours that occur before circumcision from those that occur afterward. This study assessed the association between MMC and subsequent sexual behaviours while demonstrating how population-based cross-sectional survey data can be adapted to address this temporal challenge. We analysed nationally representative data from the 2024 Zambia Demographic and Health Survey (ZDHS), including men aged 15 - 59 years who reported their circumcision status. Men who had undergone medical circumcision were compared with uncircumcised men using a matched pseudo-cohort framework that reconstructed temporal ordering based on age at circumcision. Propensity score overlap weighting was applied to improve comparability between circumcised and uncircumcised men, and odds ratios were estimated using logistic regression models incorporating overlap weights and accounting for the complex survey design. Sexual behaviour outcomes occurring after circumcision included condom non-use at last sexual intercourse, multiple sexual partners in the past 12 months, self-reported sexually transmitted infection (STI) symptoms, and composite measures of sexual risk behaviour. The analysis included 9,609 men, of whom 33.3% were medically circumcised. MMC was associated with lower odds of condom non-use at last sexual intercourse (adjusted odds ratio [aOR] = 0.75, 95% confidence interval [CI]: 0.67 - 0.85) and lower odds of reporting any sexual risk behaviour (aOR = 0.83, 95% CI: 0.72 - 0.95). No meaningful associations were observed between MMC and reporting multiple sexual partners, self-reported STI symptoms, or higher levels of composite sexual risk behaviour. In this population-based study, MMC was not associated with sexual risk compensation under routine programme conditions within the overlap population defined by the weighting scheme, supporting the behavioural safety of MMC and illustrating the value of explicitly addressing temporality when analysing behavioural outcomes using cross-sectional survey data.
Collin, L. J.; MacLehose, R. F.; Ahern, T. P.; Goodman, M.; Lash, T. L.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSBackgroundC_ST_ABSAn internal validation substudy compares an imperfect measurement of a variable with a gold standard measurement in a subset of the study population. Validation data permit calculation of a bias-adjusted estimate, expected to equal the association that would have been observed had the gold standard measurement been available for the entire study population. Guidance on optimal sampling of participants to include in validation substudies has not considered monitoring validation data as they accrue. In this paper, we develop and apply the framework of Bayesian monitoring to determine when sufficient validation data have been collected to yield a bias-adjusted estimate of association with a prespecified level of precision. MethodsWe demonstrate the utility of this method using the Study of Transition, Outcomes and Gender--a cohort study of transgender and gender non-conforming children and adolescents. Transmasculine and transfeminine status were determined from the gender code in the electronic medical record at cohort enrollment. This status is known to be misclassified because it can indicate either gender identity or sex recorded at birth. Our interest is in the association between transmasculine and transfeminine status and self-inflicted injury. To address possible exposure misclassification, we demonstrate the methods ability to determine when sufficient validation data have been collected to calculate a bias-adjusted estimate of association that is less than 80% greater than the precision of the conventional estimate. ResultsIn the conventional age-adjusted analysis, we observed that transmasculine children and adolescents were 1.80-fold more likely to inflict self-harm than transfeminine youths (95%CI 1.27, 2.55). Using the adaptive validation approach, 200 cohort members were required for validation to yield a bias-adjusted estimate of OR=3.03 (95%CI 1.76, 5.56), which was similar to the bias-adjusted estimate using complete validation data (OR=2.63, 95%CI 1.67, 4.23). ConclusionsOur method provides a novel approach to effective and efficient estimation of classification parameters as validation data accrue. This method can be applied within the context of any parent epidemiologic study design, and modified to meet alternative criteria given specific study or validation study objectives.
Follmann, D.; Proschan, M.
Show abstract
Phase III platform trials are increasingly used to evaluate a sequence of treatments for a specific disease. Traditional approaches to structure such trials tend to focus on the sequential questions rather than the performance of the entire enterprise. We consider two-stage trials where an early evaluation is used to determine whether to continue with an individual study. To evaluate performance, we use the ratio of expected wins (RW), that is, the expected number of reported efficacious treatments using a two-stage approach compared to that using standard phase III trials. We approximate the test statistics during the course of a single trial using Brownian Motion and determine the optimal stage 1 time and type I error rate to maximize RW for fixed power. At times, a surrogate or intermediate endpoint may provide a quicker read on potential efficacy than use of the primary endpoint at stage 1. We generalize our approach to the surrogate endpoint setting and show improved performance, provided a good quality and powerful surrogate is available. We apply our methods to the design of a platform trial to evaluate treatments for COVID-19 disease.
Stein, D. W.; Gaspar, F.; Piantadosi, S.; Amin, A.; Webb, B.; Lu, D.; D'Arinzo, L.; Oliver, M.; Fitzgerald, K.
Show abstract
Methods of causal inference are used to estimate treatment effectiveness for non-randomized study designs. The propensity score (i.e., the probability that a subject receives the study treatment conditioned on a set of variables related to treatment and/or outcome) is often used with matching or sample weighting techniques to, ideally, eliminate bias in the estimates of treatment effect due to treatment decisions. If multiple treatments are available, the propensity score is a function of the adjustment set and the set of possible treatments. This paper develops a compound model that separates the treatment decision into a binary decision: treat or dont treat; and a potential treatment decision: choose the treatment that would be given if the subject is treated. It is applicable if the treatment set is finite, treatments are given at one time point, and the outcome is observed at a fixed time point. This representation can reduce bias when not all treatments are available to all patients. Multiple treatment stabilized marginal structural weights were calculated with this approach, and the method was applied to an observational study to evaluate the effectiveness of different neutralizing monoclonal antibodies to treat infection with various severe acute respiratory syndrome coronavirus 2 variants.
Steier, J.
Show abstract
BackgroundViral interference, in which infection by one pathogen reduces susceptibility to another at the population level, may shape respiratory virus dynamics. Inference from surveillance data is complicated by time-varying testing behavior that can induce correlated detection patterns without any biological interaction. MethodsI developed a two-pathogen renewal model augmented with a ratio penalty that constrains interference estimates to be consistent with observed log-odds ratios of pathogen positivity. The penalty treats other-pathogen positives as implicit controls for shared testing propensity, adapting test-negative design logic to aggregate surveillance. I applied the model to US NAAT surveillance data reported to NREVSS (RSV and COVID-19; October 2020 to February 2026), validated parameter recovery in synthetic experiments, and quantified uncertainty via block bootstrap. I note at the outset that the method is conservative by design: synthetic experiments confirm a bias toward null interference estimates, so near-zero findings should not be read as proof that interference is absent. ResultsWithout the ratio penalty, estimated interference was |{theta}|sum = 0.0082 for RSV [->]COVID. With the penalty, this decreased to 0.0016 (80% reduction). Bootstrap 95% intervals included zero for all direction xlag combinations. Synthetic validation confirmed high specificity at{theta} = 0 but revealed that the method cannot recover moderate interference ({theta} [≤] 0.05), because virus-specific transmissibility deviations absorb the interference signal during Stage 1 estimation. A diagnostic decomposition showed that the ratio penalty term amplifies this bias-to-null: at{theta} = 0.01 in real data, the ratio penalty contributes a -314,000 log-joint penalty, roughly 130 times the multinomial penalty alone. Two-stage estimation was justified empirically; joint MAP estimation failed to converge across all tested configurations. ConclusionsThe ratio penalty functions as a conservative diagnostic screen with high specificity but limited sensitivity. When applied to RSV-COVID surveillance, it substantially reduces interference point estimates, with confidence intervals spanning zero. These results indicate that apparent interference signals in these data are not robust to this particular adjustment, but the methods known conservative bias means biological interference cannot be excluded. The approach is best understood as a sensitivity analysis rather than a definitive test. Author SummaryWhen one respiratory virus circulates widely, it may temporarily suppress transmission of others, a phenomenon called viral interference. Detecting interference from disease surveillance data is difficult because testing behavior changes over time: when any respiratory illness surges, more people seek tests, potentially creating correlated patterns that mimic biological interaction. I developed a statistical method to probe this confounding. Borrowing logic from vaccine studies, the method penalizes the model when its predictions diverge from the observed ratio of positive tests across pathogens. The idea is that this ratio should be stable if testing propensity fluctuates but affects all pathogens similarly. Applied to five years of US surveillance data for RSV and COVID-19, this penalty reduced apparent interference by 80%, with statistical uncertainty intervals including zero. Crucially, the method is intentionally conservative: simulation experiments show it also diminishes real interference signals, because transmissibility parameters absorb the interference effect before it can be estimated. My near-zero estimates therefore do not prove interference is absent; rather, they indicate that apparent signals in these data are not robust to this particular adjustment for testing composition. This work highlights that surveillance-based interference estimates may be sensitive to testing artifacts and provides one approach for assessing this sensitivity.
Hegde, S.; Eisenberg, J. N.; Beesley, L. J.; Mukherjee, B.
Show abstract
Epidemiologic data often violate common modeling assumptions of independence between subjects due to study design. Statistical separation is also common, particularly in the study of rare binary outcomes. Statistical separation for binary outcomes occurs when regions of the covariate space have no variation in the outcome, and separation can negatively impact the validity of logistic regression model parameters. When data are correlated, we generally use multi-level modeling for parameter estimation, and statistical approached have also been developed for handling statistical separation. Approaches for analyzing data with both separation and complex correlation, however, are not well-known. Extending prior work, we demonstrate a two-stage Bayesian modeling approach to account for both separated and highly correlated data through a motivating example examining the effect of social ties on Acute Gastrointestinal Illness (AGI) in rural Ecuador. The two-stage approach involves fitting a Bayesian hierarchical model to account for correlation using priors derived from parameter estimates from a Firth-corrected logistic regression model to account for separation. We compare estimates from the two-stage approach to standard regression methods that only account for either separation or correlation. Our results demonstrate that correctly accounting for separation and correlation when both are present can potentially provide better inference.
McIntyre, K. J.; Wiener, J. C.; Davies Smith, E.
Show abstract
The Table 2 Fallacy is an interpretation error commonly encountered in medical literature. This fallacy occurs when coefficient estimates in multivariable regression models, apart from that of the primary exposure, are interpreted as total effects on the outcome. Causal diagrams can be used to identify sets of covariates that, when adjusted for, allow for unbiased estimation and correct interpretation of multiple total effects of interest. However, proper investigation of multiple total effects requires fitting several regression models and conducting multiple inferences. As the number of inferences increases, so does the rate of a false positive finding, a phenomenon known as multiplicity. While multiple comparison procedures are recognized as a critical consideration of randomized controlled trials, opinion remains divided on their use within observational studies. This commentary highlights how multiplicity may arise alongside the Table 2 Fallacy, and how causal diagrams can be used in conjunction with multiple comparison procedures to simultaneously avoid this fallacy, control the risk of spurious findings, and further align the best practices of experimental and observational studies.
Williams, J. H.
Show abstract
BackgroundWomen from ethnic minorities have worse obstetric outcomes. Possible reasons for this are (1) social deprivation; (2) different standards of obstetric care; and (3) intrinsic ethnic differences. Here I aim to disentangle (1)-(3). MethodsI constructed two path models of causal links between parental ethnicity and obstetric outcomes. The first, no-racism, model estimated independent causal effects of ethnicity, deprivation and payment source on pregnancy and birth outcomes. The second realistic model additionally tested how far deprivation and payment source may mediate effects of ethnicity. Analyses of the models used Bayesian estimation. I analysed both the full sample of complete data and a random 1% sample. FindingsData were complete for 762786 births. The no-racism model did not fit the data, but the realistic model fitted adequately. It indicated that ethnicity, social deprivation, and private funding for care all adversely affected outcomes: (i) African American and Hispanic ethnicity caused deprivation; (ii) deprivation increased pregnancy hypertension, shortened gestation and reduced birthweight; (iii) private funding directly increased pregnancy hypertension and indirectly shortened gestation; (iv) participation in the Supplemental Nutrition Program for Women, Infants and Children (WIC) counteracted adverse effects of deprivation. (v) independently of (i)-(iv), ethnic-minority parents had shorter gestation and lighter babies. InterpretationDeprivation largely accounts for adverse obstetric outcomes in ethnic minorities. Private funding may also worsen pregnancy hypertension, but WIC improved outcomes. The uniformity of adverse birth outcomes for all ethnic minorities suggests that these result from a common factor, which may be systemic racism. Policies to reduce deprivation and increase government-funded care could importantly improve obstetric outcomes, irrespective of ethnicity. Fundingnone - I undertook the study at home. Research in ContextO_ST_ABSEvidence before this studyC_ST_ABSMany studies during the past century have shown that ethnic minorities have worse social deprivation and worse access to health services. Ethnicity, deprivation and care can all determine health outcomes, and ethnic-minority mothers have worse obstetric outcomes. However, the independent contributions of ethnicity, deprivation and care to these adverse outcomes are unknown. Added value of this studyI present here causal model of routine observational data that differentiates direct and indirect effects of ethnicity, deprivation and payment source on obstetric outcomes. The model allows (a) deprivation to mediate effects of ethnicity and (b) payment source to mediate effects of both ethnicity and deprivation. Hence, this model can disentangle the "intertwined" effects of ethnicity, deprivation and payment source on obstetric outcomes. The model also examines effects of participation in the Supplemental Nutrition Program for Women, Infants and Children on outcomes. The model fitted a 1% sample of the data after Bayesian estimation - so it bears interpretation as a representation of the real-world causal structure of the data. In the model, minority ethnicity causes deprivation and medical insurance and all of these factors independently determine adverse obstetric outcomes. Notably, medical insurance and private payment may increase the risk of pregnancy hypertension and consequently shorten gestation. Participation in WIC was beneficial. Implications of all the available evidenceCausal modelling of routine natality data may allow effective audit of health care in its social context. Understanding causes of poor outcomes can enable prediction of effects of policy change. The present results indicate that policies to ameliorate social deprivation and expand access to WIC and government-subsidised care should improve obstetric outcomes - with long-term benefits for both mothers and their babies. Extrapolating beyond obstetrics, the present results may help to illuminate mechanisms of the healthcare crisis in America.
Curnow, E.; Carpenter, J. R.; Heron, J. E.; Cornish, R. P.; Rach, S.; Didelez, V.; Langeheine, M.; Tilling, K.
Show abstract
BackgroundEpidemiological studies often have missing data. Multiple imputation (MI) is a commonly-used strategy for such studies. MI guidelines for structuring the imputation model have focused on compatibility with the analysis model, but not on the need for the (compatible) imputation model(s) to be correctly specified. Standard (default) MI procedures use simple linear functions. We examine the bias this causes and performance of methods to identify problematic imputation models, providing practical guidance for researchers. MethodsBy simulation and real data analysis, we investigated how imputation model mis-specification affected MI performance, comparing results with complete records analysis (CRA). We considered scenarios in which imputation model mis-specification occurred because (i) the analysis model was mis-specified, or (ii) the relationship between exposure and confounder was mis-specified. ResultsMis-specification of the relationship between outcome and exposure, or between exposure and confounder in the imputation model for the exposure, could result in substantial bias in CRA and MI estimates (in addition to any bias in the full-data estimate due to analysis model mis-specification). MI by predictive mean matching could mitigate for model mis-specification. Model mis-specification tests were effective in identifying mis-specified relationships. These could be easily applied in any setting in which CRA was, in principle, valid and data were missing at random (MAR). ConclusionWhen using MI methods that assume data are MAR, compatibility between the analysis and imputation models is necessary, but is not sufficient to avoid bias. We propose an easy-to-follow, step-by-step procedure for identifying and correcting mis-specification of imputation models.
Ahlqvist, V. H.; Sjoqvist, H.; Sjolander, A.; Berglind, D.; Lambert, P. C.; Lee, B. K.; Madley-Dowd, P.
Show abstract
ObjectiveFindings from family-based analyses, such as sibling comparisons, are often reported using only odds ratios or hazard ratios. We demonstrate how this can be improved upon by applying the marginalized between-within framework. Study Design and SettingWe provide an overview of sibling comparison methods and the marginalized between-within framework, which enables estimation of absolute risks and clinically relevant metrics while accounting for shared familial confounding. We illustrate the approach using Swedish registry data to examine the association between maternal smoking and infant mortality, estimating absolute risk differences, average treatment effects, attributable fractions, and numbers needed to harm (or treat). ResultsThe marginalized between-within model decomposes effects into within-and between-family components while applying a global baseline across all families. Although it typically yields similar relative estimates to conditional logistic or stratified Cox regression, the models specification of a baseline enables the estimation of absolute measures. In the applied example, absolute measures provided more interpretable and policy-relevant insights than relative estimates alone. Code for implementation in Stata and R is provided. ConclusionThe marginalized between-within framework may strengthen the interpretability of family-based analysis by enabling absolute and policy-relevant estimates for both binary and time-to-event outcomes, moving beyond the limitations of solely relying on relative effect measures. What is new?O_ST_ABSKey FindingsC_ST_ABSO_LIFindings from sibling analyses are typically presented using only relative measures, such as odds ratios or hazard ratios, limiting interpretability. C_LIO_LIThis study illustrates how the marginalized between-within framework can be used to derive clinically relevant absolute effect measures while adjusting for shared familial confounding. C_LI What this adds to what was known?O_LIUnlike conventional methods, this approach enables estimation of absolute risks, average treatment effects, attributable fractions, and numbers needed to treat or harm--using standard software--while accounting for unmeasured familial con-founding. C_LI What is the implication and what should change now?O_LIResearchers conducting sibling comparisons should consider adopting the marginalized between-within framework to report both relative and absolute effect measures. C_LIO_LIThis shift could enhance the clinical and public health relevance of family-based designs by improving interpretability and communication of findings. C_LI