Epidemiology
○ Ovid Technologies (Wolters Kluwer Health)
Preprints posted in the last 90 days, ranked by how well they match Epidemiology's content profile, based on 26 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Hazewinkel, A.-D.; Gregson, J.; Bartlett, J. W.; Gasparyan, S. B.; Wright, D.; Pocock, S.
Show abstract
Objectives: Introducing a new covariate adjustment method for hierarchical outcomes using ordinal logistic regression, comparing it with existing approaches, and assessing whether adjustment improves power in randomized trials with hierarchical outcomes. Methods: We developed an ordinal regression-based method for covariate adjustment of the win ratio and compared it with three alternatives: probability index models, inverse probability weighting, and a randomization-based estimator. Methods were applied to the EMPEROR-Preserved rial and tested through extensive simulations involving two common hierarchical outcome structures: time-to-event composites, and composites combining time-to-event with quantitative measures. Simulations assessed impacts on estimates, standard errors, and power across prognostic and non-prognostic settings. Results: In RCT data and simulations, covariate adjustment consistently increased power when adjusting for prognostic baseline variables. Gains were comparable to or greater than those in conventional Cox models, with no power loss for non-prognostic covariates. Our ordinal approach performed similarly to existing methods while providing interpretable covariate effect estimates. Adjusting for baseline values of quantitative components yielded power gains according to the baseline-to-follow-up correlation. Conclusions: Covariate adjustment for prognostic variables meaningfully improves efficiency in win ratio analyses for hierarchical outcomes. Our ordinal method is easily implemented and facilitates covariate effect interpretation. We recommend the broader adoption of covariate adjustment and our ordinal method in randomized trials using hierarchical outcomes.
Hripcsak, G.; Anand, T.; Chen, H. Y.; Zhang, L.; Chen, Y.; Suchard, M. A.; Ryan, P. B.; Schuemie, M. J.
Show abstract
Propensity score adjustment is commonly used in observational research to address confounding. Controversy persists about how to select covariates as possible confounders to generate the propensity model. A desire to include all possible confounders is offset by a concern that more covariates will augment bias or increase variance. Much of concern is over instruments, which are variables that affect the treatment but not the outcome. Adjusting for an instrument has been shown to increase bias due to unadjusted confounding and to increase the variance of the effect estimate. Large-scale propensity score (LSPS) adjustment includes most available pre-treatment covariates in its propensity model. It addresses instruments with a pair of diagnostics, ceasing the analysis if any covariate exceeds a correlation coefficient of 0.5 with the treatment and checking for an aggregation of instruments with equipoise reported as a preference score. Our simulation assesses the impact of adjusting for instruments in the context of LSPSs diagnostics. In our simulation, even when the variance of the treatment contributed by the adjusted instrument(s) exceeds an unadjusted confounder by over twenty-fold, when the correlation between the instrument(s) and the treatment was less than 0.5 and the equipoise was greater than 0.5, the additional shift in the effect estimate due to adjusting for the instrument(s) was less than the shift due to confounding by itself. Therefore, we find in this simulation that adjusting for instruments contributed a minor amount of bias to the effect estimate. This simulation aligns well with a previous assessment of the impact of adjusting for instruments and with separate empirical evidence that adjusting for many covariates surpasses attempts to identify a limited set of confounders.
Mwakazanga, D. K.; daka, v.; Gwasupika, J. K.; Dombola, A. K.; Kapungu, K. K.; Khondowe, S.; Chongwe, G. K.; Fwemba, I.; Ogundimu, E.
Show abstract
Medical male circumcision (MMC) is an established HIV prevention intervention, yet concerns persist that circumcised men may adopt higher-risk sexual behaviours following the procedure. Evidence from observational studies has been inconsistent, partly because many analyses do not adequately distinguish behaviours that occur before circumcision from those that occur afterward. This study assessed the association between MMC and subsequent sexual behaviours while demonstrating how population-based cross-sectional survey data can be adapted to address this temporal challenge. We analysed nationally representative data from the 2024 Zambia Demographic and Health Survey (ZDHS), including men aged 15-59 years who reported their circumcision status. Men who had undergone medical circumcision were compared with uncircumcised men using a matched pseudo-cohort framework that reconstructed temporal ordering based on age at circumcision. Propensity score overlap weighting was applied to improve comparability between circumcised and uncircumcised men, and odds ratios were estimated using logistic regression models incorporating overlap weights and accounting for the complex survey design. Sexual behaviour outcomes occurring after circumcision included condom non-use at last sexual intercourse, multiple sexual partners in the past 12 months, self-reported sexually transmitted infection (STI) symptoms, and composite measures of sexual risk behaviour. The analysis included 9,609 men, of whom 33.3% were medically circumcised. MMC was associated with lower odds of condom non-use at last sexual intercourse (adjusted odds ratio [aOR] = 0.75, 95% confidence interval [CI]: 0.67-0.85) and lower odds of reporting any sexual risk behaviour (aOR = 0.83, 95% CI: 0.72-0.95). No meaningful associations were observed between MMC and reporting multiple sexual partners, self-reported STI symptoms, or higher levels of composite sexual risk behaviour. In this population-based study, MMC was not associated with sexual risk compensation under routine programme conditions within the overlap population defined by the weighting scheme, supporting the behavioural safety of MMC and illustrating the value of explicitly addressing temporality when analysing behavioural outcomes using cross-sectional survey data.
Gago, J. E.; Boyer, C.; Lipsitch, M.
Show abstract
BackgroundAntimicrobial prescribing policies affect not only treated patients but also their contacts. Two-stage randomized (2SR) designs can be used to estimate these spillover effects, yet this study design has not been widely applied to evaluate antimicrobial strategies. MethodsWe developed a stochastic agent-based model that simulates a hospital ward with two competing bacterial strains (drug-A-susceptible and drug-A-resistant). We used the simulation to emulate a 2SR trial: six hospital ward clusters were randomized 1:1 to either a 90/10 (90% Drug A, 10% Drug B, drug B was assumed to have no known resistance) or 50/50 treatment allocation strategy; individuals within clusters were then randomized to Drug A or Drug B following the assigned cluster-level allocation strategy. We estimated direct, indirect, total, and overall causal effects on incident infection and mortality. Sensitivity analyses varied the treatment effect, transmission rate, mortality structure, and number of clusters. ResultsThe direct effect of drug choice showed that Drug A recipients had higher mortality (due to non-concordant treatment of resistant infections). This effect varied over time as the wards strain ecology diverged between strategies. There was also an indirect effect for Drug A recipients--reflecting spillover from higher resistant-strain prevalence under 90/10--but it was approximately null for Drug B recipients, whose broad-spectrum coverage insulated them from changes in the ward strain distribution. The overall effect--the policy-level comparison--showed that the 50/50 strategy reduced total mortality, but this net benefit concealed a redistribution: resistant-strain deaths decreased while susceptible-strain deaths increased, a consequence captured by the overall effect but invisible to the direct effect. These findings were qualitatively consistent across all sensitivity scenarios. ConclusionsWe demonstrate that antimicrobial prescribing produces spillover effects not captured by conventional individually randomized trials. These effects can substantially alter treatment outcomes in a population. We propose that the 2SR design, grounded in a formal causal framework for interference, is better suited for evaluating population-level effects of antimicrobial strategies--whether implemented as a randomized trial or emulated with observational data.
Brophy, J. M.
Show abstract
ObjectiveTo explore the interpretation of unexpected results from a randomized controlled trial (RCT). Study Design and SettingAdjunctive frequentist (power and type{square}M error) and Bayesian analyses were performed on a recently published RCT reporting a statistically significant relative risk reduction (p <0.01) for caffeinated coffee drinkers compared with abstinence on atrial fibrillation (AF) recurrence. Individual patient data for the Bayesian survival models were reconstructed from the RCT published material and priors informed by the RCT power calculations. ResultsThe original RCT design had limited power for realistic effect sizes, increasing susceptibility to type{square}M (magnitude) error. Bayesian analyses also tempered the benefit for caffeinated coffee implied by standard statistical analysis resulting in only modest probabilities of clinically meaningful risk reductions (e.g., hazard ratio < 0.9 of 88% or a risk difference > 2% of 82%). ConclusionsSupplemental frequentist and Bayesian approaches can provide robustness checks for unexpected RCT findings, providing contextualization, clarifying distinctions between statistical and clinical significance, and guiding replication needs. HighlightsO_LIRandomized controlled trial (RCT) results may be unexpected and challenge prior beliefs C_LIO_LISupplemental frequentist and Bayesian analyses can clarify interpretation of surprising findings C_LIO_LIPower and type{square}M error assessments help evaluate design adequacy for realistic effects C_LIO_LIBayesian posterior probabilities provide additional nuanced insights into contextulaization and clinical significance C_LI
Goncalves, B. P.; Franco, E. L.
Show abstract
Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.
Codi, A. M.; Rogawski McQuade, E.; Benkeser, D.
Show abstract
Background: The value proposition for Shigella vaccines is strengthened by the potential for vaccines to prevent linear growth faltering. However, because expected effect sizes in Phase 3 vaccine trials are small due to limited Shigella incidence, a simple comparison of growth by randomized vaccine arm is likely underpowered and may yield null or even inverse results. Methods: We consider a new approach that estimates vaccine effects in the subgroup that would be infected in absence of vaccination, termed the naturally infected. In simulations parameterized by multi-site studies of diarrhea, we compare power for detecting linear growth effects in the naturally infected versus the full study. We further quantified how power is impacted by trial design choices including immunization schedule, study site, and timing of growth measurements. Findings: Simple comparisons of height-for-age z-score (HAZ) by randomized vaccine arm have extremely limited power (<15%) at realistic trial sizes (n=2,500 to 20,000) and carry risk of showing an inverse effect due to random chance. In contrast, naturally infected effects were five to ten times larger and power was up to three times higher. Using a twelve month immunization schedule with a single growth endpoint in high-incidence settings maximized power to detect an effect. Interpretations: While realistically sized clinical trials may be underpowered to detect an effect of vaccination on growth, estimation using the naturally infected subpopulation and careful trial design improve chances of detecting an effect while mitigating risks of null or inverse results.
Irlmeier, R.; Jin, Z.; Ye, F.
Show abstract
Background Simon two-stage designs for binary endpoints and their time-to-event analogues, including the Kwak and Jung method, rely on a fixed null benchmark. Their Type I error control is valid only when that benchmark is correctly specified. In practice, historical benchmarks are often inconsistent due to small samples, population heterogeneity, changing eligibility criteria, and evolving standards of care. Even modest misspecifications can substantially inflate the Type I error rate, leading to costly advancement of ineffective treatments. Methods We propose the Interval-Null Robust (INR) two-stage design framework that accounts for uncertainty in the historical null benchmark. We define the null hypothesis as a plausible range of clinically uninteresting values: p[isin][p0L, p0U] for binary endpoints and {lambda}[isin][{lambda}0L, {lambda}0U] (or equivalent survival probabilities) for time-to-event endpoints. Type I error is controlled uniformly over the full null interval: sup{theta}[isin]{theta}0 Pr{theta}(Go) [≤] . Under the monotonicity of the Go probability, the supremum occurs at the least favorable null configuration - p0U and {lambda}0L - but the design is not reduced to a point-null formulation. The interval defines the uncertainty set for error control and is used in selecting among feasible designs through robust criteria such as worst-case regret or minimal average expected sample size. Results Across representative planning scenarios for both endpoint types, classic designs calibrated to a single benchmark exhibit substantial Type I error inflation when the true null parameter exceeds the assumed planning value. INR designs maintain the nominal Type I error rate across the full null interval, directly addressing this vulnerability to benchmark misspecification. The robustness-efficiency trade-off can be managed through design constraints and robust optimization criteria while preserving uniform Type I error control. Conclusions INR two-stage designs offer a transparent framework for addressing historical control uncertainty in single-arm Phase II trials. By replacing reliance on a fixed benchmark assumption with a more realistic interval of clinically plausible null values, INR design reduces the risk of false-positive Go-decisions caused by benchmark misspecification. INR applies to both binary and time-to-event endpoints and is implemented in the open-source INRDesign R package and accompanying interactive Shiny app.
Hagan, J.
Show abstract
Background. Cross-validation (CV) is widely used to estimate predictive performance, but can overestimate performance when applied at the observation level to repeated-measures data. When continuous predictor variables are measured repeatedly within subjects and the binary outcome is defined at the subject level, naive observation-level CV introduces data leakage through within-subject dependence, producing optimistically biased estimates of the area under the receiver operating characteristic curve (AUROC). The magnitude of this bias and the performance of alternative partitioning strategies have not been formally characterized for this data structure. Methods. Three CV strategies were compared for estimating subject-level AUROC in ridge logistic regression models: naive observation-level 10-fold CV, subject-level 10-fold CV, and leave-one-cluster-out (LOCO) CV. The framework was applied to a motivating clinical dataset of daily oxygenation measures and retinopathy of prematurity outcomes among 101 extremely low birth weight infants. A factorial simulation study was conducted across 162 parameter combinations varying cluster count (20-150), intraclass correlation (0.1-0.5), within-cluster autocorrelation (0.2-0.8), and outcome prevalence (10-35%), with 500 simulated datasets per condition (76,389 valid datasets total). Results. In the motivating dataset, naive CV produced optimism of +0.078 AUROC units for severe ROP prediction (15 events, 101 subjects) and +0.031 for any ROP prediction (48 events). Subject-level 10-fold CV closely approximated LOCO (deviation [≤] 0.015). In the simulation, naive CV optimism ranged from +0.039 to +0.204 across all conditions, increasing monotonically with higher ICC, higher autocorrelation, fewer clusters, and lower event rates. Subject-level 10-fold CV was essentially unbiased relative to LOCO across all 162 conditions (mean absolute deviation = 0.002). Conclusions. Naive observation-level CV meaningfully overestimates discriminative performance in the repeated-measures binary outcome setting and should not be used. Subject-level CV partitioning effectively eliminates this bias. Accordingly, subject-level partitioning should be considered essential, not optional, when validating prediction models using repeated-measures data with subject-level outcomes.
Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.
Show abstract
Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.
Blackburn, A.
Show abstract
Introduction: The Alcohol Use Disorders Identification Test-Consumption (AUDIT-C) is a widely utilized screening tool in large-scale electronic health record (EHR) biobanks. However, its categorical, range-based survey responses present a significant challenge for epidemiological research, especially where continuous quantitative variables may be preferred. Standard workarounds, such as assigning categorical midpoints or utilizing aggregate ordinal scores for regression mapping often introduce false mathematical precision or obscure critical behavioral nuances between drinking frequency and quantity. This report presents a novel framework for presenting and bounding categorical alcohol survey data. Materials and Methods: I developed two complementary descriptive techniques: (1) a two-dimensional cross-tabulation matrix that preserves the interaction between drinking frequency and typical quantity, and (2) a systematic bounding algorithm that applies time-interval correction factors to calculate strict lower and upper estimates of average daily alcohol consumption. To demonstrate the real-world utility of this framework, I applied these methods to three analytical descriptive scenarios within a European ancestry (EUR) cohort of the All of Us Research Program: Generalized Anxiety Disorder (GAD) prevalence (n=104,893), minor allele frequency (MAF) for the rs1229984 genetic variant (n=104,890), and self-reported active duty military service history (n=104,893). Results: Application of the cross-tabulation matrix revealed patterns across all three descriptive scenarios. For example, participants reporting the highest frequency ("4 or more times a week") combined with the highest quantity ("10 or More" drinks) demonstrated a GAD prevalence of 13.5%, compared to 5.8% among those reporting the same frequency but a low quantity ("1 or 2" drinks). A general trend of increased anxiety in higher quantity drinkers contrasts with a general trend of decreased anxiety in higher frequency drinkers. Bounding estimates for average daily consumption ranged from 0.299 to 0.730 drinks for individuals with GAD, and 0.303 to 0.787 for those without. Those who reported having been active duty in the US Armed Forces demonstrated a general trend toward more frequent drinking and higher average daily consumption estimates (0.339 to 0.875) than those who had not (0.297 to 0.770). The minor allele of the genetic variant rs1229984 exhibited a clear effect reducing both frequency and quantity, resulting in lower average daily consumption estimates. Conclusions: This bounding and mapping framework provides researchers with an additional method to traditional midpoint and aggregate scoring methods. By explicitly defining the uncertainty inherent in categorical survey instruments and visualizing cohort distributions across intersecting behavioral axes, this methodology improves the resolution, reproducibility, and interpretability of lifestyle exposure data.
Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.
Show abstract
Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.
Kleper, S. L.; Melamed, R. D.
Show abstract
Machine learning models for causal inference aim to adjust for confounding factors that are associated with both an exposure and an outcome, creating a spurious biased association. But, these methods are rarely empirically evaluated to assess their success in mitigating such bias. Recent advances in knowledge representation, including both foundation models and knowledge graphs, could enrich these models, but rigorous evaluations are needed in order to assess their potential. Here, we ask whether enriching existing causal inference models with knowledge representations from foundation models can improve confounding control. Rather than using semi-simulated data to address this question, we focus on examples of real confounding: we emulate target randomized active comparator trials that are subject to confounding by indication. Our results can guide researchers aiming to develop or apply methods for discovering causal effects from observational data.
Stevenson, M.; Reisner, S.; Pontes, C.; Linton, S.; Borquez, A.; Radix, A.; Schneider, J.; Cooney, E.; Wirtz, A.; ENCORE Study Group,
Show abstract
Transgender women are routinely recruited for HIV prevention research and describe feeling over-researched, undervalued, and disconnected from the benefits of research. Research fatigue refers to the adverse impacts of research participation from the volume, frequency, or intensity of research engagement. Research beneficence, an underdeveloped construct, refers to perceptions that research participation is empowering, appreciated, and beneficial to individuals and communities. This study sought to develop and psychometrically evaluate a research fatigue and beneficence scale and examine associations with cohort retention and study procedures among transgender women in the US and Puerto Rico. We developed a novel 7-item measure of research fatigue and beneficence informed by prior literature and qualitative work with transgender women. We assessed internal consistency reliability, factor structure, convergent and divergent validity, and predictive validity with 6-month study retention outcomes and procedures among 2189 transgender women enrolled in a US nationwide cohort (April 2023-December 2024) for the full 7-item research fatigue and beneficence scale, a 4-item research beneficence subscale, and a single-item research fatigue measure. Research beneficence items demonstrated good internal consistency (0.78) and excellent model fit. Research fatigue and beneficence varied by race/ethnicity with participants of color reporting both greater empowerment and greater concerns about community-level benefits. The item "I feel that I am asked to participate in research too frequently" was associated with lower 6-month retention, greater survey missingness, and preference for less invasive HIV testing modalities. Findings highlight multiple dimensions of research experience and the need for reduced participant burden, culturally tailored study designs, and intentional dissemination efforts to improve participant-centered research practices.
Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.
Show abstract
Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.
Obeng-Gyasi, E.
Show abstract
Background: Mixture epidemiology deploys sophisticated estimators, Bayesian kernel machine regression with causal mediation analysis (BKMR-CMA), quantile G-computation (QGC), and parametric G-computation, alongside conventional regression. Comparative evaluations have assumed additive, non-mediated data-generating processes, leaving conditions under which estimator choice determines causal validity uncharacterized. Methods: We developed a simulation framework using military-relevant exposure distributions (metals, per- and polyfluoroalkyl substances [PFAS], polychlorinated biphenyls [PCBs]) and allostatic load (AL) across three deployment tiers, with parameters drawn from military occupational health and contamination literature. Four data-generating processes were specified as directed acyclic graphs: direct effects with confounding (M1), full mediation through AL (M2), synergistic AL-exposure interaction (M3), and collider structure (M4). We evaluated ordinary least squares (OLS), QGC, G-computation, and BKMR-CMA on bias, root mean squared error, and 95% confidence interval coverage across 500 Monte Carlo replications at n = 500 and n = 1,000. Results: No estimator dominated across all mechanisms. Under M1, OLS and G-computation produced near-identical modest positive bias; BKMR-CMA achieved lower root mean squared error through kernel shrinkage. Under M2, BKMR-CMA exhibited severe positive bias for AL (mean bias = +0.579 SD units; coverage = 32.8%). Under M3, BKMR-CMA was the only estimator achieving nominal 95% coverage for AL (95.2%), while regression-based approaches fell to 83.6%. Under M4, G-computation produced persistent bias and near-zero coverage for lead, reflecting structural non-identification. Conclusions: Estimator validity is fundamentally mechanism-dependent. Researchers should base estimator choice on explicit causal assumptions about whether AL functions as confounder, mediator, moderator, or collider, particularly in military and occupational cohorts. We provide a mechanism-to-estimator mapping for applied researchers.
ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.
Show abstract
When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.
Jones, L.; Ergas, R.; Tibbs, A.; Russo, E. T.; Norville, J.; Bingay, B.; Brown, C. M.; Reich, N. G.; Pasco, R.
Show abstract
Background Pediatric immunizations for Respiratory Syncytial Virus (RSV), including monoclonal antibodies for infants and vaccines for pregnant people, have become broadly available and can prevent severe RSV outcomes in infants. However, quantifying the impact of RSV immunization in prevention of severe pediatric illness at the population-level is limited by lack of RSV case surveillance data. The Massachusetts Department of Public Health (DPH) conducted a modeling analysis using routine public health surveillance data to estimate the state-level impact of new RSV immunization products on Emergency Department (ED) visits and hospitalizations in Massachusetts for highest risk pediatric groups. Methods A scenario projection tool, called R.Scenario.Vax, was utilized to simulate RSV-associated ED hospital encounters by age group in the context of newly available immunizations. ED visit and hospitalization data from the National Syndromic Surveillance Program (NSSP) during the time period 10/08/2017--10/19/2024 were analyzed, scaled to account for changes in RSV testing practices over time and missing encounter volume in historic data, and utilized to inform model fit of a "typical" RSV season. RSV immunization data from the Massachusetts Immunization Information System (MIIS) for the 2023--2024 and 2024--2025 RSV seasons informed high and moderate pediatric RSV immunization coverage scenarios and their impact was compared to a counterfactual reference scenario of no new immunizations. Median projections were quantitatively and qualitatively compared to observed 2024--2025 season data. Percent reduction in hospital encounters and encounters averted per 10,000 population were calculated for each scenario as compared to the reference. Results Projections for the youngest at-risk age groups showed significantly lower RSV-associated ED visits and hospitalizations during the 2024--2025 season for both high and moderate immunization coverage scenarios. Median projections for infants under 6 months old in the highest coverage scenario, wherein nearly all infants were immunized, showed 72.6% lower ED visits and 73.4% lower hospitalizations when compared to the reference scenario, equating to 262 ED visits and 85 hospitalizations averted per 10,000 population. Conclusions Our results support the use of modeling methods for public health insights and suggest that RSV immunizations for infant populations result in significantly lower RSV-related ED encounters in Massachusetts.
Rowan, C. G.; Maringe, C.
Show abstract
PurposeWhen emulating trials of medication initiation using real-world data, there may be ambiguity regarding the most suitable time zero event for the research question of interest. The time zero event must be strongly associated with the clinical indication for treatment, confer a high probability of actual treatment initiation, and be measurable with sufficient temporal precision in the source data. When it is uncertain whether a candidate event will satisfy these three conditions simultaneously, empirical identification of predictors of medication initiation can provide valuable guidance. The objective of this study was to empirically identify predictors of incident atorvastatin initiation to inform the definition of time zero for future target trial emulations. MethodsA retrospective cohort study was conducted using Medicare claims data (study period January 1, 2018 - December 31, 2019). The cohort included statin naive beneficiaries aged [≥] 65 years with [≥] 12 months of continuous enrollment, as of the study period start date, and at least one new or incident prescription claim after study period start date. Atorvastatin initiation was defined by the first dispensing (index date). Non-atorvastatin initiators (reference group) were sampled at 25%; their index date was a randomly selected date of a new medication dispensing. Candidate predictor variables were ascertained in the 6 months pre-index and included demographics, comorbidities (classified separately from inpatient and outpatient claims), healthcare utilization, and pharmacotherapy. We developed and applied an eight-step procedure to identify independent predictors of incident atorvastatin initiation. ResultsThe study cohort comprised 481,742 incident atorvastatin initiators and 896,575 non-atorvastatin initiators (25% random sample). The strongest predictors of atorvastatin initiation were inpatient admission for cerebral infarction (OR 11.51, 95% CI 10.79-12.27) and myocardial infarction (OR 5.32, 95% CI 5.03-5.62). For example, a White male with a recent inpatient diagnosis of cerebral infarction had a predicted probability of atorvastatin initiation of 82% (95% CI 81-83%). ConclusionThe empirically identified predictors of atorvastatin initiation (acute cardio/cerebrovascular events) align with ACC/AHA guidelines recommending prompt statin therapy for secondary prevention. These predictors satisfy the three key requirements for a valid time zero event and should mitigate selection bias, channeling bias, and residual confounding in future target trial emulations. KEY POINTSO_LIFindings: Acute myocardial infarction and cerebral infarction recorded during an inpatient admission were the strongest predictors of incident atorvastatin initiation among statin-naive Medicare beneficiaries age 65 years and older. C_LIO_LIClinical Context: These findings align with current American College of Cardiology/American Heart Association guidelines that recommend prompt statin therapy for secondary prevention after these acute cardiovascular events. C_LIO_LIImplications for Future Research: Anchoring the time zero event to an inpatient admission for myocardial/cerebral infarction satisfies the three key requirements for a valid time zero event when studying medication initiation: it is strongly associated with the clinical indication for treatment, carries a high probability of actual statin initiation, and can be identified with sufficient temporal precision in administrative data. This approach should reduce channeling bias, selection bias (e.g., immortal time bias) and residual confounding in future target trial emulations. C_LIO_LIBroader Significance: The study provides an empirically derived, high-probability time zero event that can strengthen future target trial emulations using real-world data to assess the safety of commonly used medicines in older adults, a population often underrepresented in randomized trials to obtain regulatory approval. C_LI PLAIN LANGUAGE SUMMARYThis study aimed to identify a clear starting point for future research on the safety of atorvastatin in older adults. Using Medicare claims data from 2018-2019, researchers examined more than 1.3 million beneficiaries aged 65 and older who had not previously taken statins in the last year. They developed a predictive model to determine which patient characteristics were most strongly linked to starting atorvastatin. The strongest predictors were a recent hospital admission for heart attack (myocardial infarction) or stroke (cerebral infarction). These events were associated with a much higher chance of promptly receiving atorvastatin, which aligns with American College of Cardiology and American Heart Association guidelines recommending statin therapy soon after such events for secondary prevention. By using hospital discharge after these acute events as the starting point for future studies, researchers can create comparisons that reduce bias and allow more reliable estimates of atorvastatins effects on potential harms in this vulnerable elderly population.
Hammarlund, N.; Wang, X.; Grant, D.; Purves, D.
Show abstract
Importance: Health systems are increasingly adopting race-neutral cardiovascular risk prediction tools, yet no study has examined how these choices redistribute preventive treatment at the point of clinical decision-making, particularly for Black individuals who already bear a disproportionate cardiovascular burden. Objective: To evaluate how including race, substituting social determinants of health (SDoH), or excluding both reshapes cardiovascular risk classification, calibration, fairness, and clinical decisions. Design: Retrospective cohort study with repeated cross-validation and integrated decision-focused evaluation, using CARDIA study data with baseline measures from 2010 and cardiovascular outcomes through 2021. Setting: Community-based longitudinal cohort recruited across multiple U.S. cities. Participants: 3,241 Black and White adults without known cardiovascular disease at baseline. Main Outcomes and Measures: Three models predicting 10-year incident cardiovascular disease were compared on predictive performance, calibration, fairness metrics, and realized clinical utility at the ACC/AHA 7.5% preventive treatment threshold. Results: Among 3,241 participants (46% Black, mean age 50 years, 6.9% CVD incidence), overall performance was similar across models (AUC 0.762 to 0.768). Predictor choice substantially reshaped clinical decisions at the guideline threshold. The SDoH-based model improved parity metrics but produced systematic underprediction and concentrated new overtreatment among Black participants. The clinical-only model further improved parity metrics but generated new undertreatment, with four cases of untreated CVD and none avoided. No single evaluative dimension captured the full equity consequences. Conclusions and Relevance: Parity metrics improved under both race-neutral models, yet both produced clinical harms concentrated among Black participants not apparent in population-average metrics. The case for race removal has rested on conceptual grounds, but comprehensive empirical evaluation is necessary before health systems can be confident their model choices truly serve those most at risk.