Epidemiology
○ Ovid Technologies (Wolters Kluwer Health)
Preprints posted in the last 30 days, ranked by how well they match Epidemiology's content profile, based on 26 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Hazewinkel, A.-D.; Gregson, J.; Bartlett, J. W.; Gasparyan, S. B.; Wright, D.; Pocock, S.
Show abstract
Objectives: Introducing a new covariate adjustment method for hierarchical outcomes using ordinal logistic regression, comparing it with existing approaches, and assessing whether adjustment improves power in randomized trials with hierarchical outcomes. Methods: We developed an ordinal regression-based method for covariate adjustment of the win ratio and compared it with three alternatives: probability index models, inverse probability weighting, and a randomization-based estimator. Methods were applied to the EMPEROR-Preserved rial and tested through extensive simulations involving two common hierarchical outcome structures: time-to-event composites, and composites combining time-to-event with quantitative measures. Simulations assessed impacts on estimates, standard errors, and power across prognostic and non-prognostic settings. Results: In RCT data and simulations, covariate adjustment consistently increased power when adjusting for prognostic baseline variables. Gains were comparable to or greater than those in conventional Cox models, with no power loss for non-prognostic covariates. Our ordinal approach performed similarly to existing methods while providing interpretable covariate effect estimates. Adjusting for baseline values of quantitative components yielded power gains according to the baseline-to-follow-up correlation. Conclusions: Covariate adjustment for prognostic variables meaningfully improves efficiency in win ratio analyses for hierarchical outcomes. Our ordinal method is easily implemented and facilitates covariate effect interpretation. We recommend the broader adoption of covariate adjustment and our ordinal method in randomized trials using hierarchical outcomes.
Mwakazanga, D. K.; daka, v.; Gwasupika, J. K.; Dombola, A. K.; Kapungu, K. K.; Khondowe, S.; Chongwe, G. K.; Fwemba, I.; Ogundimu, E.
Show abstract
Medical male circumcision (MMC) is an established HIV prevention intervention, yet concerns persist that circumcised men may adopt higher-risk sexual behaviours following the procedure. Evidence from observational studies has been inconsistent, partly because many analyses do not adequately distinguish behaviours that occur before circumcision from those that occur afterward. This study assessed the association between MMC and subsequent sexual behaviours while demonstrating how population-based cross-sectional survey data can be adapted to address this temporal challenge. We analysed nationally representative data from the 2024 Zambia Demographic and Health Survey (ZDHS), including men aged 15 - 59 years who reported their circumcision status. Men who had undergone medical circumcision were compared with uncircumcised men using a matched pseudo-cohort framework that reconstructed temporal ordering based on age at circumcision. Propensity score overlap weighting was applied to improve comparability between circumcised and uncircumcised men, and odds ratios were estimated using logistic regression models incorporating overlap weights and accounting for the complex survey design. Sexual behaviour outcomes occurring after circumcision included condom non-use at last sexual intercourse, multiple sexual partners in the past 12 months, self-reported sexually transmitted infection (STI) symptoms, and composite measures of sexual risk behaviour. The analysis included 9,609 men, of whom 33.3% were medically circumcised. MMC was associated with lower odds of condom non-use at last sexual intercourse (adjusted odds ratio [aOR] = 0.75, 95% confidence interval [CI]: 0.67 - 0.85) and lower odds of reporting any sexual risk behaviour (aOR = 0.83, 95% CI: 0.72 - 0.95). No meaningful associations were observed between MMC and reporting multiple sexual partners, self-reported STI symptoms, or higher levels of composite sexual risk behaviour. In this population-based study, MMC was not associated with sexual risk compensation under routine programme conditions within the overlap population defined by the weighting scheme, supporting the behavioural safety of MMC and illustrating the value of explicitly addressing temporality when analysing behavioural outcomes using cross-sectional survey data.
Brophy, J. M.
Show abstract
ObjectiveTo explore the interpretation of unexpected results from a randomized controlled trial (RCT). Study Design and SettingAdjunctive frequentist (power and type{square}M error) and Bayesian analyses were performed on a recently published RCT reporting a statistically significant relative risk reduction (p <0.01) for caffeinated coffee drinkers compared with abstinence on atrial fibrillation (AF) recurrence. Individual patient data for the Bayesian survival models were reconstructed from the RCT published material and priors informed by the RCT power calculations. ResultsThe original RCT design had limited power for realistic effect sizes, increasing susceptibility to type{square}M (magnitude) error. Bayesian analyses also tempered the benefit for caffeinated coffee implied by standard statistical analysis resulting in only modest probabilities of clinically meaningful risk reductions (e.g., hazard ratio < 0.9 of 88% or a risk difference > 2% of 82%). ConclusionsSupplemental frequentist and Bayesian approaches can provide robustness checks for unexpected RCT findings, providing contextualization, clarifying distinctions between statistical and clinical significance, and guiding replication needs. HighlightsO_LIRandomized controlled trial (RCT) results may be unexpected and challenge prior beliefs C_LIO_LISupplemental frequentist and Bayesian analyses can clarify interpretation of surprising findings C_LIO_LIPower and type{square}M error assessments help evaluate design adequacy for realistic effects C_LIO_LIBayesian posterior probabilities provide additional nuanced insights into contextulaization and clinical significance C_LI
Goncalves, B. P.; Franco, E. L.
Show abstract
Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.
Codi, A. M.; Rogawski McQuade, E.; Benkeser, D.
Show abstract
Background: The value proposition for Shigella vaccines is strengthened by the potential for vaccines to prevent linear growth faltering. However, because expected effect sizes in Phase 3 vaccine trials are small due to limited Shigella incidence, a simple comparison of growth by randomized vaccine arm is likely underpowered and may yield null or even inverse results. Methods: We consider a new approach that estimates vaccine effects in the subgroup that would be infected in absence of vaccination, termed the naturally infected. In simulations parameterized by multi-site studies of diarrhea, we compare power for detecting linear growth effects in the naturally infected versus the full study. We further quantified how power is impacted by trial design choices including immunization schedule, study site, and timing of growth measurements. Findings: Simple comparisons of height-for-age z-score (HAZ) by randomized vaccine arm have extremely limited power (<15%) at realistic trial sizes (n=2,500 to 20,000) and carry risk of showing an inverse effect due to random chance. In contrast, naturally infected effects were five to ten times larger and power was up to three times higher. Using a twelve month immunization schedule with a single growth endpoint in high-incidence settings maximized power to detect an effect. Interpretations: While realistically sized clinical trials may be underpowered to detect an effect of vaccination on growth, estimation using the naturally infected subpopulation and careful trial design improve chances of detecting an effect while mitigating risks of null or inverse results.
Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.
Show abstract
Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.
Blackburn, A.
Show abstract
Introduction: The Alcohol Use Disorders Identification Test-Consumption (AUDIT-C) is a widely utilized screening tool in large-scale electronic health record (EHR) biobanks. However, its categorical, range-based survey responses present a significant challenge for epidemiological research, especially where continuous quantitative variables may be preferred. Standard workarounds, such as assigning categorical midpoints or utilizing aggregate ordinal scores for regression mapping often introduce false mathematical precision or obscure critical behavioral nuances between drinking frequency and quantity. This report presents a novel framework for presenting and bounding categorical alcohol survey data. Materials and Methods: I developed two complementary descriptive techniques: (1) a two-dimensional cross-tabulation matrix that preserves the interaction between drinking frequency and typical quantity, and (2) a systematic bounding algorithm that applies time-interval correction factors to calculate strict lower and upper estimates of average daily alcohol consumption. To demonstrate the real-world utility of this framework, I applied these methods to three analytical descriptive scenarios within a European ancestry (EUR) cohort of the All of Us Research Program: Generalized Anxiety Disorder (GAD) prevalence (n=104,893), minor allele frequency (MAF) for the rs1229984 genetic variant (n=104,890), and self-reported active duty military service history (n=104,893). Results: Application of the cross-tabulation matrix revealed patterns across all three descriptive scenarios. For example, participants reporting the highest frequency ("4 or more times a week") combined with the highest quantity ("10 or More" drinks) demonstrated a GAD prevalence of 13.5%, compared to 5.8% among those reporting the same frequency but a low quantity ("1 or 2" drinks). A general trend of increased anxiety in higher quantity drinkers contrasts with a general trend of decreased anxiety in higher frequency drinkers. Bounding estimates for average daily consumption ranged from 0.299 to 0.730 drinks for individuals with GAD, and 0.303 to 0.787 for those without. Those who reported having been active duty in the US Armed Forces demonstrated a general trend toward more frequent drinking and higher average daily consumption estimates (0.339 to 0.875) than those who had not (0.297 to 0.770). The minor allele of the genetic variant rs1229984 exhibited a clear effect reducing both frequency and quantity, resulting in lower average daily consumption estimates. Conclusions: This bounding and mapping framework provides researchers with an additional method to traditional midpoint and aggregate scoring methods. By explicitly defining the uncertainty inherent in categorical survey instruments and visualizing cohort distributions across intersecting behavioral axes, this methodology improves the resolution, reproducibility, and interpretability of lifestyle exposure data.
Stevenson, M.; Reisner, S.; Pontes, C.; Linton, S.; Borquez, A.; Radix, A.; Schneider, J.; Cooney, E.; Wirtz, A.; ENCORE Study Group,
Show abstract
Transgender women are routinely recruited for HIV prevention research and describe feeling over-researched, undervalued, and disconnected from the benefits of research. Research fatigue refers to the adverse impacts of research participation from the volume, frequency, or intensity of research engagement. Research beneficence, an underdeveloped construct, refers to perceptions that research participation is empowering, appreciated, and beneficial to individuals and communities. This study sought to develop and psychometrically evaluate a research fatigue and beneficence scale and examine associations with cohort retention and study procedures among transgender women in the US and Puerto Rico. We developed a novel 7-item measure of research fatigue and beneficence informed by prior literature and qualitative work with transgender women. We assessed internal consistency reliability, factor structure, convergent and divergent validity, and predictive validity with 6-month study retention outcomes and procedures among 2189 transgender women enrolled in a US nationwide cohort (April 2023-December 2024) for the full 7-item research fatigue and beneficence scale, a 4-item research beneficence subscale, and a single-item research fatigue measure. Research beneficence items demonstrated good internal consistency (0.78) and excellent model fit. Research fatigue and beneficence varied by race/ethnicity with participants of color reporting both greater empowerment and greater concerns about community-level benefits. The item "I feel that I am asked to participate in research too frequently" was associated with lower 6-month retention, greater survey missingness, and preference for less invasive HIV testing modalities. Findings highlight multiple dimensions of research experience and the need for reduced participant burden, culturally tailored study designs, and intentional dissemination efforts to improve participant-centered research practices.
ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.
Show abstract
When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.
Cook, S. H.
Show abstract
Background. Young sexual and gender minorities of color face compound health risks shaped by interlocking systems of racism, cisgenderism, and class inequality. Spatial health research documents that place shapes health, but existing methods cannot specify the mechanisms through which spatial configurations produce different health outcomes for differently positioned people. This gap prevents targeted intervention. ObjectiveTo develop and pilot test the Spatial Intersectionality Health Framework (SIHF), which specifies three mechanisms through which space produces intersectional health inequities: Layered (multiple oppressive systems activating simultaneously), Positional (the same space producing different health pathways by intersectional position), and Conditional (nominally protective spaces carrying hidden costs for specific positions). We also introduce and validate Intersectional Geographically-Explicit Ecological Momentary Assessment (IGEMA) as the methodology operationalizing SIHF across three data levels. MethodsThe GeoSense study enrolled 32 young sexual and gender minorities of color (ages 18-29) in New York City. IGEMA was implemented across three integrated levels: (1) GPS mobility tracking via participants personal smartphones, linked to census tract structural exposure indices across n=19 participants; (2) ecological momentary assessment of intersectional discrimination with multilevel modeling of mood, stress, and sleep outcomes; and (3) map-guided qualitative interviews with SIHF mechanism coding and intercoder reliability assessment across 92 coded records from 18 participants. This study was conducted as the pilot for NIH R01HL169503. ResultsAll three SIHF mechanisms were empirically detectable. A compound structural gendered racism index outperformed every single-axis alternative in predicting daily mood (b=-0.048, p=.001) and stress (b=0.121, p<.001). The Positional mechanism accounted for 71% of coded harm experiences. Intercoder reliability for mechanism assignment reached kappa=0.824 at Stage 2 reconciliation. Daily intersectional discrimination predicted greater sleep disturbance (b=1.308, p=.004). ConclusionsSIHF and IGEMA together provide an empirically testable framework for specifying how space produces intersectional health inequities. Mechanism specification, not spatial location alone, is the condition for designing research and intervention that reaches the source of harm for multiply marginalized populations.
Hammarlund, N.; Wang, X.; Grant, D.; Purves, D.
Show abstract
Importance: Health systems are increasingly adopting race-neutral cardiovascular risk prediction tools, yet no study has examined how these choices redistribute preventive treatment at the point of clinical decision-making, particularly for Black individuals who already bear a disproportionate cardiovascular burden. Objective: To evaluate how including race, substituting social determinants of health (SDoH), or excluding both reshapes cardiovascular risk classification, calibration, fairness, and clinical decisions. Design: Retrospective cohort study with repeated cross-validation and integrated decision-focused evaluation, using CARDIA study data with baseline measures from 2010 and cardiovascular outcomes through 2021. Setting: Community-based longitudinal cohort recruited across multiple U.S. cities. Participants: 3,241 Black and White adults without known cardiovascular disease at baseline. Main Outcomes and Measures: Three models predicting 10-year incident cardiovascular disease were compared on predictive performance, calibration, fairness metrics, and realized clinical utility at the ACC/AHA 7.5% preventive treatment threshold. Results: Among 3,241 participants (46% Black, mean age 50 years, 6.9% CVD incidence), overall performance was similar across models (AUC 0.762 to 0.768). Predictor choice substantially reshaped clinical decisions at the guideline threshold. The SDoH-based model improved parity metrics but produced systematic underprediction and concentrated new overtreatment among Black participants. The clinical-only model further improved parity metrics but generated new undertreatment, with four cases of untreated CVD and none avoided. No single evaluative dimension captured the full equity consequences. Conclusions and Relevance: Parity metrics improved under both race-neutral models, yet both produced clinical harms concentrated among Black participants not apparent in population-average metrics. The case for race removal has rested on conceptual grounds, but comprehensive empirical evaluation is necessary before health systems can be confident their model choices truly serve those most at risk.
Danon, L.; Brooks-Pollock, E.
Show abstract
Background Social contact surveys, which measure who-contacts-whom, are widely used to inform infectious disease transmission models and estimate the reproduction number (R), a key metric for assessing epidemic risk. Despite their widespread use, sample size calculations are not routinely performed. Aims To assess the impact of sample size on estimates of R and determine a practical target sample size for social contact surveys used in epidemic modelling. Methods We conducted a review of social contact surveys (2008-2025) to characterise current practice. We characterised the impact of survey size on epidemic metrics using two social contact surveys, the UK Social Contact Survey and POLYMOD (Europe) and two methods. For each dataset and approach, we generated repeated subsamples and calculated the resulting reproduction numbers, characterised their distributions and measured uncertainty. Results We identified 107 unique social contact surveys from 57 studies. Sample sizes ranged from 30 to more than 10,000 participants, with a median of 1,438. One quarter of surveys contained fewer than 1,000 participants. From our simulations, we find that sample sizes below 200 individuals can result in highly variability reproduction numbers. Increasing sample size increases precision, and the most meaningful gains are up to 1,300 individuals. Increasing sample sizes over 3,000 individuals leads to smaller gains. Conclusions A minimum sample size of approximately 1,200-1,300 participants appears sufficient for general-purpose use. These findings support the inclusion of sample size considerations in the design, reporting and interpretation of social contact surveys used for epidemic intelligence and public health decision-making.
Fitzgerald, O.; Keller, E.; Illingworth, P.; Lieberman, D.; Peate, M.; Kotevski, D.; Paul, R.; Rodino, I.; Parle, A.; Hammarberg, K.; Copp, T.; Chambers, G. M.
Show abstract
Study questionWhat are the characteristics and treatment outcomes of women who undertook planned egg freezing (PEF) in Australia and New Zealand between 2009 and 2023? Summary answerThere has been an average yearly increase in the uptake of PEF of 35%, with most women undergoing a single PEF procedure in their mid-thirties. Given ten years follow-up a little over one in four women return, with nearly half of those using donor sperm and one-third achieving a live birth. What is known alreadyPEF, where women freeze their eggs as a strategy to preserve fertility, has increased dramatically in high income countries in the last decade. Despite the rapid uptake of PEF, there remains limited information to guide women, clinicians and policy makers regarding the characteristics of women undertaking this procedure and treatment outcomes. Study design, size, durationA retrospective population-based cohort study of all women who undertook PEF in Australia and New Zealand between 2009 and 2023, including their subsequent return to thaw their eggs and treatment outcomes. Where women returned to utilise their eggs, all subsequent embryo transfer procedures were linked enabling calculation of live birth rates per woman. Participants/materials, setting, methods20,209 women who undertook PEF in Australia and New Zealand between 2009 and 2023 including 1,657 women who returned to thaw their eggs. Main results and the role of chanceThere has been a huge increase in uptake of PEF, from 55 women in 2009 to 4,919 in 2023. Women who freeze their eggs are typically aged 34-38 years (interquartile range) and nulliparous (98.6%). For women with at least 10 years follow-up (i.e. undertook PEF in 2009-13; N=514), 27.9% returned and thawed their frozen eggs (average time to return: 4.9 years). This reduced to 22.1% in those with at least 5 years follow-up (i.e. undertook PEF in 2009-2018; N=4,288). Of those who used their frozen eggs, 47% used donor sperm. After at least two years follow up, 33.9% had a live birth, rising over time to 37.8% for eggs thawed between 2019-2021. Limitations, reasons for cautionIn the timeframe 2009-2019 we did not have information on whether egg freezing occurred because of a cancer diagnosis, a cohort we wished to exclude from the study. As a result, for this timeframe we weighted observations by the probability that egg freezing occurred due to cancer, with the prediction model developed on the years 2020-2023. Wider implications of the findingsThis study provides recent and comprehensive data on PEF to guide prospective patients and clinicians and inform policy. The exponential growth in PEF in Australia and New Zealand mirrors trends in other high-income countries, suggesting a doubling time of 2-3 years. Study findings highlight the need for setting realistic expectations about the likelihood of returning to use frozen eggs and live birth rates. Study funding/competing interest(s)2020-2025 MRFF Emerging Priorities and Consumer Driven Research initiative: EPCD000014
Hernandez, M. A.; Kwong, A. S.; Li, C.; Simpkin, A. J.; Wootton, R. E.; Joinson, C.; Elhakeem, A.
Show abstract
Understanding depressive symptoms dynamics and their determinants is crucial for designing effective mental health support initiatives. This study compared two methods for describing youth depressive symptoms trajectories and investigated associations of early-life factors (maternal education, maternal perinatal depression, domestic violence, physical, emotional, or sexual abuse, bullying victimisation, psychiatric disorder) with trajectory features. Prospective data from 8,264 mostly White European participants (54% female), including self-reported Short Moods and Feelings Questionnaires on ten occasions between 10-25 years, were used. Trajectories were summarised using functional principal component analysis (FPCA) and P-splines linear mixed-effect (PLME) models. Estimated derivatives were used to obtain magnitude and age of peak symptoms and peak symptoms velocity. Both methods performed comparably, but PLME models tended to over-smooth trajectories. Peak symptoms and peak velocity were higher and occurred >1 year earlier in females than males. All early-life factors were associated with higher peak symptoms, and most associated with higher and earlier peak velocity. Abuse and bullying additionally associated with earlier age of peak symptoms. FPCA is a useful alternative for characterising depressive symptoms trajectories and informing time-sensitive preventative measures to reduce impact of depression before symptoms reach their peak. Early-life stressors may accelerate timeline and intensity of symptoms escalation during adolescence. Lay summaryUnderstanding development of depressive symptoms and factors shaping them is crucial for designing effective mental health support initiatives. This study used data from over 8,000 young people regularly followed up from before birth to compare two cutting-edge methods for describing depressive symptoms trajectories and examined how known risk factors for adulthood depression relate to the severity and rate of change of depressive symptoms in adolescence. We found that both methods performed well and that the peaks in depressive symptoms and their rate of change were, on average, higher and occurred over a year earlier in females than males. Our findings additionally suggest that early-life stressors (e.g., abuse, bullying) may accelerate the development of depression, highlighting the importance of early prevention.
Wang, J.; Morrison, J.
Show abstract
1Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal relationships between complex traits. Standard MR can be used to estimate an average causal effect at the population level, and typically assumes a linear exposure-outcome relationship. Recently, several methods for estimating nonlinear effects have been developed. However, many have been found to produce spurious empirical findings when subjected to negative control analyses. We propose that this poor performance may be attributable to heterogeneity in variant-exposure associations. We demonstrate that heterogeneous genetic effects on exposure lead to biased estimates, poor coverage, and inflated type I error in control function and stratification-based methods. In contrast, two-stage least squares (TSLS) methods are robust to such heterogeneity, but suffer from low precision and low power in some circumstances. We show that a statistical test for heterogeneity can be used to guide the choice of nonlinear MR methods. Using UK Biobank data, we reassess the causal effects of BMI, vitamin D, and alcohol consumption on blood pressure, lipid, C-reactive protein, and age (negative control). We find strong evidence of heterogeneity for all three exposures, and also recapitulate previous results that control function and stratification-based methods are prone to false positives. Finally, using nonparametric TSLS, we identify evidence of nonlinear causal effects of BMI on HDL cholesterol, triglycerides, and C-reactive protein; however, specific estimates of the shape of these relationships are imprecise. Altogether, our results suggest that common nonlinear MR methods are unreliable in the presence of realistic levels of heterogeneity, and that more methodological development is required before practically useful nonlinear MR is feasible.
Mboya, G. O.
Show abstract
Machine learning models trained on observational data from one environment frequently fail when deployed in another, because standard learning algorithms exploit spurious correlations alongside causal ones. Invariant learning methods address this problem by seeking representations that support stable prediction across training environments, but their behavior on tabular data remains poorly characterized. We present CausTab, a gradient variance regularization framework for causal invariant representation learning on mixed tabular data. CausTab penalizes the variance of parameter gradients across training environments, providing a richer invariance signal than the scalar penalty used by Invariant Risk Minimization (IRM). We provide formal results showing that the gradient variance penalty is zero at causally invariant solutions and positive at solutions that rely on spurious features. Through experiments on synthetic data across three spurious-correlation regimes, four cycles of the National Health and Nutrition Examination Survey (NHANES), and four hospital systems in the UCI Heart Disease dataset, we demonstrate that: (1) IRM consistently degrades relative to standard empirical risk minimization (ERM) on tabular data, losing up to 13.8 AUC points in spurious-dominant settings, a failure we trace mechanistically to penalty collapse during training; (2) CausTab matches or exceeds ERM in every experimental condition; (3) CausTab achieves consistently better probability calibration than both ERM and IRM; and (4) invariant learning methods fail when environments differ in outcome prevalence rather than in spurious feature correlations, a boundary condition we characterize both empirically and theoretically. We introduce the Spurious Dominance Index (SDI), a practical scalar diagnostic for determining whether a dataset requires invariant learning, and validate it across all experimental settings
Qadeer, A.; Gohar, N.; Maniyar, P.; Shafi, N.; Juarez, L. M.; Mortada, I.; Pack, Q. R.; Jneid, H.; Gaalema, D. E.
Show abstract
Introduction: Smoking cessation after acute coronary syndrome (ACS) is a Class I recommendation, yet prescription pharmacotherapy use remains low and its real-world cardiovascular effectiveness when added to nicotine replacement therapy (NRT) is poorly characterized. Methods: We conducted a retrospective cohort study using the TriNetX US Collaborative Network (67 healthcare organizations). Adults hospitalized with ACS who received NRT within one month, serving as a proxy for active smoking status, were identified. Two co-primary propensity-matched (1:1, 50 covariates, caliper 0.10 SD) comparisons evaluated bupropion + NRT and varenicline + NRT individually versus NRT alone; a supportive analysis evaluated combined pharmacotherapy versus NRT alone. All-cause mortality was the primary endpoint. Secondary outcomes included MACE, heart failure exacerbations, major bleeding, TIA/stroke, emergency rehospitalizations, and cardiac rehabilitation utilization, assessed at 6 months and 1 year via Kaplan-Meier analysis. Hazard ratios (HRs) greater than 1.0 indicate higher hazard in the NRT-only group. Results: After matching, the combined analysis comprised 8,574 pairs, the bupropion analysis 4,654 pairs, and the varenicline analysis 2,126 pairs. At 1 year, the combined pharmacotherapy group had significantly lower all-cause mortality (HR 1.26, 95% CI 1.16-1.37), MACE (HR 1.16, 95% CI 1.12-1.21), heart failure exacerbations (HR 1.16, 95% CI 1.08-1.25), major bleeding (HR 1.18, 95% CI 1.08-1.28), and greater cardiac rehabilitation utilization (HR 0.82, 95% CI 0.74-0.92; all p < 0.001). TIA/stroke did not differ significantly. Six-month results were consistent. Both varenicline and bupropion individually showed lower mortality and MACE. A urinary tract infection falsification endpoint showed no between-group differences, supporting matching validity. The pharmacotherapy group had higher rates of new-onset depression, driven predominantly by bupropion recipients. Conclusions: In this propensity-matched real-world analysis, adding prescription smoking cessation pharmacotherapy to NRT after ACS was associated with lower mortality and fewer adverse cardiovascular events, supporting broader integration into post-ACS care pathways.
Garcia Quesada, M.; Wallrafen-Sam, K.; Kiti, M. C.; Ahmed, F.; Aguolu, O. G.; Ahmed, N.; Omer, S. B.; Lopman, B. A.; Jenness, S. M.
Show abstract
Non-pharmaceutical interventions (NPIs) have been important for controlling SARS-CoV-2 transmission, particularly before and during initial vaccine rollout. During the pandemic, the US Centers for Disease Control and Prevention issued isolation and masking guidance in case of COVID-19-like illness, a positive SARS-CoV-2 test, or known exposure to SARS-CoV-2. However, the impact of this guidance on mitigating transmission in office workplaces is unclear. We used a network-based mathematical model to estimate the impact of this guidance on SARS-CoV-2 transmission among office workers and their communities. The model represented social contacts in the home, office, and community. We used data from the CorporateMix study to parametrize social contacts among office workers and calibrated the model to represent the COVID-19 epidemic in Georgia, USA from January 2021 through August 2022. In the reference scenario (58% adherence to guidance among office workers and the broader population), workplace transmission accounted for a small fraction of total infections. Reducing adherence among office workers to 0% increased workplace transmissions by 27.1% and increasing adherence to 75% reduced workplace transmission by 7.0%. Increasing adherence to 75% among office workers had minimal impact on symptomatic cases and deaths; increasing it among the broader population was more effective in reducing office worker cases and deaths. In our model, moderate adherence to recommended NPIs in workplaces was effective in reducing transmission, but increasing adherence had limited benefit given workplaces that have low contact intensity and hybrid work arrangements. These results underscore the public health benefits of community-wide adoption of recommended NPIs.
Lin, T.; Li, Y.; Huang, Z.; Gui, T. T.; Wang, W.; Guo, Y.
Show abstract
Target trial emulation (TTE) offers a principled way to estimate treatment effects using real-world observational data, but analyses of time-varying treatment strategies remain vulnerable to immortal time bias. The clone-censor-weight (CCW) approach is increasingly used to address this problem, yet key aspects of its causal interpretation and implementation remain unclear. In this work, we emulate a target trial using electronic health records (EHRs) to compare completion of a 3-dose 9-valent human papillomavirus vaccination (HPV) series within 12 months versus remaining partially vaccinated among vaccine initiators. We link CCW to the classic potential outcome framework in causal inference, evaluate the role of different weighting mechanisms, and account for within-subject correlation induced by cloning using cluster-robust variance estimation. Our study provides practical guidance for applying CCW in real-world comparative effectiveness studies to address immortal time bias and supports more rigorous and interpretable treatment effect estimation in TTE.
O'Connor, M.; O'Connor, E.; Hughes, E. K.; Bann, D.; Knight, K.; Tabor, E.; Bridger-Staatz, C.; Gray, S.; Burgner, D.; Olsson, C. A.
Show abstract
Background: Population-based cohort studies are increasingly expected to demonstrate benefits for public health and wider society. However, there is limited systematic evidence on what such impact entails or how it is generated and sustained. To address this gap, we examined researcher perspectives on the impact of cohort studies. Methods: We conducted, to our knowledge, the first quantitative study of researcher views on cohort impact, recruiting active cohort researchers through national and international networks between August and December 2025. The anonymous cross-sectional survey captured researcher characteristics, perceived contributions, impact processes, challenges, and open-ended reflections. Results: A total of 163 cohort researchers participated, primarily from Australia (42%) and the UK (23%). Participants perceived their work as informing a wide range of societal issues and reported investing an average of 24% of their work time in impact-related activities. While most respondents (73%) believed their research leads to tangible policy or practice change, two thirds indicated that impact is rarely or never demonstrable shortly after study completion (67%) and seldom attributable to a single study (67%). Key concerns included pressure to overstate contributions (80%), perceived disadvantages for cohort studies in impact assessments (78%), and inadequate skills or resources to achieve impact (65%). Conclusions: Cohort researchers perceive their work as generating broad societal contributions and invest substantial effort in supporting impact. However, they face systemic challenges in both achieving and demonstrating impact. These findings highlight the need for impact frameworks that better capture complexity, long-term influence, and cumulative contributions, while mitigating unintended consequences.