Back

Epidemiology

26 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Novel Representations of Vaccine Protection Against Progression to Severe Disease Over Time
2026-02-14 epidemiology 10.64898/2026.02.12.26346197
#1 (8.4%)
Show abstract

BackgroundVaccines can prevent severe disease by preventing infection or by reducing progression among those who become infected. Vaccine effectiveness against progression given infection is often used to quantify this second mechanism, but it conditions on infection, which is itself affected by vaccination. As a result, this estimand lacks a clear causal interpretation and may behave non-intuitively over time. MethodsWe introduce a conceptual framework that models protection against infection ...

2
An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials
2026-03-06 epidemiology 10.64898/2026.03.05.26347653
#1 (5.1%)
Show abstract

Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-...

3
Apparent RSV-COVID interference is not robust to adjustment for shared testing propensity
2025-12-30 epidemiology 10.64898/2025.12.30.25343230
Top 0.1% (3.0%)
Show abstract

BackgroundViral interference, in which infection with one pathogen reduces susceptibility to another, may influence respiratory virus dynamics. Inference from surveillance data is complicated by time-varying testing behavior that can induce correlated detection patterns independent of biological interaction. MethodsWe developed a multi-pathogen renewal model augmented with a ratio penalty that constrains interference estimates to be consistent with observed log-odds ratios of pathogen positivit...

4
Correcting for effect modification in the doubly-ranked non-linear Mendelian randomization method
2026-01-23 epidemiology 10.64898/2026.01.22.26344640
Top 0.1% (2.1%)
Show abstract

The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simple strategy to mitigate this bias by modelling GxE interactions and removing the fitted GxE component from the exposure before stratification by the doubly-ranked method. In simulations, the proposed GxE correction strategy eliminated GxE-induced bias with null, linear and non-linear exposure-outcome...

5
An integrated analytical framework for gender-based violence research: A simulation study combining machine learning and causal inference
2025-12-18 epidemiology 10.64898/2025.12.15.25342247
Top 0.2% (1.9%)
Show abstract

Current research on Gender-Based Violence (GBV) typically separates predictive machine learning and causal inference into distinct analytical silos. Yet, grasping the multi-level determinants of violence requires an approach that can both identify high-value predictors and disentangle their causal mechanisms. This study develops and validates an integrated five-phase analytical framework to demonstrate how combining machine learning with structural causal modeling improves the detection of deter...

6
Constructing and analyzing a synthetic life course cohort based on pooling two data sources: A case study of early adulthood depression symptomatology and late-life cognition
2026-02-27 epidemiology 10.64898/2026.02.25.26347113
Top 0.2% (1.8%)
Show abstract

BackgroundSynthetic cohorts created by combining two cohorts can be useful when no single data set includes both the exposure and outcome data of interest. We estimate the effects of depression in early adulthood on later-life memory outcome using two nationally representative cohorts separately and in a synthetic sample. MethodsWe used the National Longitudinal Study of Youth 1979 (NLSY; N=5,747) and the Health and Retirement Study (HRS; N=6,846) and a synthetic cohort combining exposure data ...

7
Estimating age patterns and grouped temporal trends in human contact patterns with Bayesian P-splines
2026-01-23 epidemiology 10.64898/2026.01.21.26344589
Top 0.3% (1.6%)
Show abstract

This paper presents a smoothing method to estimate age-specific human contact patterns and their variations over different periods. Specifically, it examines how age-specific contact patterns shift under varying conditions, such as holiday periods and levels of public health intervention. The method uses Bayesian P-splines to smooth age-specific contact rates and leverages Laplace approximations for fast Bayesian inference, significantly reducing computational complexity. The proposed methodolog...

8
Enhanced screening and bacterial sexually-transmitted infection diagnoses after HIV pre-exposure prophylaxis initiation
2025-12-29 epidemiology 10.64898/2025.12.19.25342713
Top 0.3% (1.5%)
Show abstract

BackgroundRecipients of HIV pre-exposure prophylaxis (PrEP) experience higher rates of gonorrhea and chlamydia diagnoses than non-recipients. However, it is unclear if these observations reflect a causal relationship between PrEP initiation and acquisition of sexually transmitted infections (i.e., behavioral "risk compensation"), or alternatively a diagnostic bias related to PrEP recipients screening more frequently than non-recipients. MethodsWe conducted a self-controlled case series study co...

9
A bootstrap particle filter for viral Rt inference and forecasting using wastewater data
2026-03-06 epidemiology 10.64898/2026.03.06.26347747
Top 0.3% (1.5%)
Show abstract

Wastewater is increasingly being recognized as an important data stream that can contribute to infectious disease surveillance and forecasting. With this recognition, a growing number of statistical inference approaches are being developed to use wastewater data to provide quantitative insights into epidemiological dynamics. However, few existing approaches have allowed for systematic integration of data streams for inference, for example by combining case incidence data and/or serological data ...

10
Integrating stakeholder perspectives in modeling routine data for therapeutic decision-making
2026-02-18 epidemiology 10.64898/2026.02.18.26346074
Top 0.3% (1.5%)
Show abstract

BackgroundRoutinely collected health data are increasingly used to generate real-world evidence for therapeutic decision-making. Yet, stakeholders, including clinicians, pharmaceutical industry representatives, patient advocacy groups, and statisticians, prioritize different aspects of data quality, analysis, and interpretation. Without explicit consideration of these perspectives, analyses risk being fragmented, misaligned with end-user needs, or lacking transparency. MethodsWe developed a sta...

11
A Beta Regression Framework with Intentional Left-Censoring for Quantifying Familial Longevity
2026-01-15 epidemiology 10.64898/2026.01.13.26343996
Top 0.3% (1.5%)
Show abstract

Defining and quantifying exceptional familial human survival is a persistent challenge in longevity research. Traditional approaches rely on binary thresholds, arbitrary cutoffs, or simple descriptive measures, which discard information on variation among the oldest individuals, ignore differences in background mortality, and yield unstable family-level summaries. We propose a principled, model-based framework that transforms survival times into percentiles relative to population life tables, st...

12
Accelerating vaccine trials during an outbreak of Disease-X: the effect of pathogen super-spreading on ring-trial design
2026-02-18 epidemiology 10.64898/2026.02.17.26346480
Top 0.3% (1.5%)
Show abstract

The prospective design of vaccine efficacy trials for deployment in outbreaks requires advance consideration of plausible outbreak scenarios, anticipated vaccine characteristics, and logistical and ethical constraints. As part of CEPIs 100 Days Mission to accelerate vaccine development against a novel Disease X, we evaluated trial designs for a hypothetical Nipah-X outbreak. We assumed Nipah-X would share key features with Nipah, including high case fatality rates and substantial super-spreading...

13
Spatial Clustering of School Susceptibles Drives Divergent US Measles Outbreaks
2026-02-27 epidemiology 10.64898/2026.02.25.26347103
Top 0.4% (1.4%)
Show abstract

The two largest US measles outbreaks in over two decades (2025 Gaines County, Texas: 414 cases, contained; 2025-2026 Spartanburg County, South Carolina: 923+ cases, ongoing) occurred in counties with similar sub-threshold K-12 MMR coverage (85.1% vs 88.8%), yet their trajectories diverged dramatically. Using kernel density estimation with a common bandwidth and bootstrap uncertainty quantification, we compared sub-county vaccination data at the district level for Texas (3 districts, 3,560 studen...

14
Methodological Guidance for Predictor Variable Selection for Adolescent Smoking Outcomes in Global Youth Tobacco Survey Using R and Python
2026-02-17 epidemiology 10.64898/2026.02.14.26346305
Top 0.4% (1.4%)
Show abstract

BackgroundThe Global Youth Tobacco Survey (GYTS) is widely used to monitor tobacco use among adolescents worldwide. However, inconsistent analytical approaches particularly in handling complex survey designs and predictor selection limit comparability across countries, survey waves, and software platforms. Although much of the GYTS literature relies on proprietary tools such as SAS and SPSS, practical and transparent guidance on implementing reproducible, theory-informed analyses remains limited...

15
Uncovering identifiability of epidemiological models: basic reproduction number and complementary data streams
2026-01-19 epidemiology 10.64898/2026.01.16.26344284
Top 0.4% (1.4%)
Show abstract

Mathematical models of infectious disease dynamics are routinely fitted to surveillance data to estimate epidemiological parameters and inform public health decisions. Such data are typically discrete and noisy, but before attempting estimation, it is essential to ask whether the model structure itself permits unique parameter identification at least under perfect (continuous, noise-free) observations. This mathematical property of a model with respect to observation(s), known as structural iden...

16
Planning robust clinical trials for Shigella vaccines: A simulation-based evaluation of the impact of naturally-acquired immunity on vaccine performance
2025-12-12 epidemiology 10.64898/2025.12.11.25341848
Top 0.4% (1.3%)
Show abstract

BackgroundSeveral Shigella vaccine candidates are in late stages of development, and the design of large Phase 3 trials in target populations is underway. Immunologic catch up by unvaccinated infants to vaccinated infants, which is determined by the trial site-specific force of infection, may modify the vaccine efficacy (VE) estimates observed in such trials. To set expectations and support optimal planning of future Shigella vaccine trials, we aimed to quantify the potential bias of VE estimate...

17
Study protocol for estimating modern US social contact patterns: the ENGAGED study
2026-01-11 epidemiology 10.64898/2026.01.08.26343704
Top 0.4% (1.3%)
Show abstract

BackgroundAccurately capturing social contact data is essential for developing effective mathematical models to forecast disease trends and evaluate interventions. There are limited population-based data of social contacts in the United States (US) which limits our ability to accurately model infectious disease transmission. MethodsTo fill in this gap, we conducted a staggered longitudinal cohort study in metropolitan Atlanta, Georgia, USA. We aimed to characterize contact patterns and examine ...

18
Using Negative Control Outcomes to Detect Selection Bias in Mendelian Randomization Studies
2026-02-01 epidemiology 10.64898/2026.01.30.26345215
Top 0.5% (1.3%)
Show abstract

Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically ...

19
Comparative performance of the concurrent comparator design with existing vaccine safety surveillance approaches on real-world observational health data
2026-01-26 public and global health 10.64898/2026.01.25.26344812
Top 0.5% (1.3%)
Show abstract

BackgroundIt is critical public health concern to identify safety signals originating from wide-scale immunization efforts. Such safety signals may be identified from spontaneous reports and other data sources. Although some work has been done on the best methods for vaccine safety surveillance, there is a scarcity of information on how these perform in analyses of real-world data. MethodsWe use four administrative claims databases and one electronic health record (EHR) database to evaluate the...

20
Characterizing the impact of the COVID-19 pandemic on HIV testing among Medicaid beneficiaries
2026-02-14 epidemiology 10.64898/2026.02.12.26346199
Top 0.5% (1.3%)
Show abstract

ObjectivesEstimate the HIV testing, diagnoses, and test positivity rates among Medicaid beneficiaries in 2016-2021 and assess the impact of the COVID-19 pandemic on these outcomes. DesignProspective observational study of Medicaid enrollment, inpatient, and outpatient claims data from 27 states, 2016-2021. MethodsWe assessed Medicaid claims from adult beneficiaries with full benefits whose first continuous enrollment was [≥]6 months without dual enrollment in other insurance, and without pr...