Epidemiology
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
BackgroundVaccines can prevent severe disease by preventing infection or by reducing progression among those who become infected. Vaccine effectiveness against progression given infection is often used to quantify this second mechanism, but it conditions on infection, which is itself affected by vaccination. As a result, this estimand lacks a clear causal interpretation and may behave non-intuitively over time. MethodsWe introduce a conceptual framework that models protection against infection ...
Show abstract
Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-...
Show abstract
BackgroundViral interference, in which infection with one pathogen reduces susceptibility to another, may influence respiratory virus dynamics. Inference from surveillance data is complicated by time-varying testing behavior that can induce correlated detection patterns independent of biological interaction. MethodsWe developed a multi-pathogen renewal model augmented with a ratio penalty that constrains interference estimates to be consistent with observed log-odds ratios of pathogen positivit...
Show abstract
The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simple strategy to mitigate this bias by modelling GxE interactions and removing the fitted GxE component from the exposure before stratification by the doubly-ranked method. In simulations, the proposed GxE correction strategy eliminated GxE-induced bias with null, linear and non-linear exposure-outcome...
Show abstract
Current research on Gender-Based Violence (GBV) typically separates predictive machine learning and causal inference into distinct analytical silos. Yet, grasping the multi-level determinants of violence requires an approach that can both identify high-value predictors and disentangle their causal mechanisms. This study develops and validates an integrated five-phase analytical framework to demonstrate how combining machine learning with structural causal modeling improves the detection of deter...
Show abstract
BackgroundSynthetic cohorts created by combining two cohorts can be useful when no single data set includes both the exposure and outcome data of interest. We estimate the effects of depression in early adulthood on later-life memory outcome using two nationally representative cohorts separately and in a synthetic sample. MethodsWe used the National Longitudinal Study of Youth 1979 (NLSY; N=5,747) and the Health and Retirement Study (HRS; N=6,846) and a synthetic cohort combining exposure data ...
Show abstract
This paper presents a smoothing method to estimate age-specific human contact patterns and their variations over different periods. Specifically, it examines how age-specific contact patterns shift under varying conditions, such as holiday periods and levels of public health intervention. The method uses Bayesian P-splines to smooth age-specific contact rates and leverages Laplace approximations for fast Bayesian inference, significantly reducing computational complexity. The proposed methodolog...
Show abstract
BackgroundRecipients of HIV pre-exposure prophylaxis (PrEP) experience higher rates of gonorrhea and chlamydia diagnoses than non-recipients. However, it is unclear if these observations reflect a causal relationship between PrEP initiation and acquisition of sexually transmitted infections (i.e., behavioral "risk compensation"), or alternatively a diagnostic bias related to PrEP recipients screening more frequently than non-recipients. MethodsWe conducted a self-controlled case series study co...
Show abstract
Wastewater is increasingly being recognized as an important data stream that can contribute to infectious disease surveillance and forecasting. With this recognition, a growing number of statistical inference approaches are being developed to use wastewater data to provide quantitative insights into epidemiological dynamics. However, few existing approaches have allowed for systematic integration of data streams for inference, for example by combining case incidence data and/or serological data ...
Show abstract
BackgroundRoutinely collected health data are increasingly used to generate real-world evidence for therapeutic decision-making. Yet, stakeholders, including clinicians, pharmaceutical industry representatives, patient advocacy groups, and statisticians, prioritize different aspects of data quality, analysis, and interpretation. Without explicit consideration of these perspectives, analyses risk being fragmented, misaligned with end-user needs, or lacking transparency. MethodsWe developed a sta...
Show abstract
Defining and quantifying exceptional familial human survival is a persistent challenge in longevity research. Traditional approaches rely on binary thresholds, arbitrary cutoffs, or simple descriptive measures, which discard information on variation among the oldest individuals, ignore differences in background mortality, and yield unstable family-level summaries. We propose a principled, model-based framework that transforms survival times into percentiles relative to population life tables, st...
Show abstract
The prospective design of vaccine efficacy trials for deployment in outbreaks requires advance consideration of plausible outbreak scenarios, anticipated vaccine characteristics, and logistical and ethical constraints. As part of CEPIs 100 Days Mission to accelerate vaccine development against a novel Disease X, we evaluated trial designs for a hypothetical Nipah-X outbreak. We assumed Nipah-X would share key features with Nipah, including high case fatality rates and substantial super-spreading...
Show abstract
The two largest US measles outbreaks in over two decades (2025 Gaines County, Texas: 414 cases, contained; 2025-2026 Spartanburg County, South Carolina: 923+ cases, ongoing) occurred in counties with similar sub-threshold K-12 MMR coverage (85.1% vs 88.8%), yet their trajectories diverged dramatically. Using kernel density estimation with a common bandwidth and bootstrap uncertainty quantification, we compared sub-county vaccination data at the district level for Texas (3 districts, 3,560 studen...
Show abstract
BackgroundThe Global Youth Tobacco Survey (GYTS) is widely used to monitor tobacco use among adolescents worldwide. However, inconsistent analytical approaches particularly in handling complex survey designs and predictor selection limit comparability across countries, survey waves, and software platforms. Although much of the GYTS literature relies on proprietary tools such as SAS and SPSS, practical and transparent guidance on implementing reproducible, theory-informed analyses remains limited...
Show abstract
Mathematical models of infectious disease dynamics are routinely fitted to surveillance data to estimate epidemiological parameters and inform public health decisions. Such data are typically discrete and noisy, but before attempting estimation, it is essential to ask whether the model structure itself permits unique parameter identification at least under perfect (continuous, noise-free) observations. This mathematical property of a model with respect to observation(s), known as structural iden...
Show abstract
BackgroundSeveral Shigella vaccine candidates are in late stages of development, and the design of large Phase 3 trials in target populations is underway. Immunologic catch up by unvaccinated infants to vaccinated infants, which is determined by the trial site-specific force of infection, may modify the vaccine efficacy (VE) estimates observed in such trials. To set expectations and support optimal planning of future Shigella vaccine trials, we aimed to quantify the potential bias of VE estimate...
Show abstract
BackgroundAccurately capturing social contact data is essential for developing effective mathematical models to forecast disease trends and evaluate interventions. There are limited population-based data of social contacts in the United States (US) which limits our ability to accurately model infectious disease transmission. MethodsTo fill in this gap, we conducted a staggered longitudinal cohort study in metropolitan Atlanta, Georgia, USA. We aimed to characterize contact patterns and examine ...
Show abstract
Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically ...
Show abstract
BackgroundIt is critical public health concern to identify safety signals originating from wide-scale immunization efforts. Such safety signals may be identified from spontaneous reports and other data sources. Although some work has been done on the best methods for vaccine safety surveillance, there is a scarcity of information on how these perform in analyses of real-world data. MethodsWe use four administrative claims databases and one electronic health record (EHR) database to evaluate the...
Show abstract
ObjectivesEstimate the HIV testing, diagnoses, and test positivity rates among Medicaid beneficiaries in 2016-2021 and assess the impact of the COVID-19 pandemic on these outcomes. DesignProspective observational study of Medicaid enrollment, inpatient, and outpatient claims data from 27 states, 2016-2021. MethodsWe assessed Medicaid claims from adult beneficiaries with full benefits whose first continuous enrollment was [≥]6 months without dual enrollment in other insurance, and without pr...