Epidemics — Latest Matching Preprints

1

A formula for the basic reproduction number of an infectious disease in a heterogeneous population with structured mixing

Colman, E.; Chatzilena, A.; Prasse, B.; Danon, L.; Brooks Pollock, E.

2026-03-30 epidemiology 10.64898/2026.03.27.26349419 medRxiv

Top 0.1%

22.7%

Show abstract

The basic reproduction number of an infectious disease is known to depend on the structure of contacts between individuals in a population. This relationship has been explored mathematically through two well-known models: one which depends on a matrix of contact rates between different demographic groups, and another which depends on the variability of contact rates over the population. Here we introduce a model that combines and generalises these two approaches. We derive a formula for the basic reproduction number and validate it through comparisons to simulated outbreaks. Applying this method to contact survey data collected in Belgium between 2020 and 2022, we find that our model produces higher estimates of the basic reproduction number and larger relative changes over periods when social contact behaviour was changing during the COVID-19 pandemic. Our analysis suggests some practical considerations when using contact data in models of infectious disease transmission.

2

A Machine Learning Framework for Constructing Heterogeneous Contact Networks: Implications for Epidemic Modelling

Murray Kearney, L.; Davis, E. L.; Keeling, M. J.

2026-03-16 epidemiology 10.64898/2026.03.14.26348396 medRxiv

Top 0.1%

22.4%

Show abstract

Capturing the structured mixing within a population is key to the reliable projection of infectious disease dynamics and hence informed control. Both heterogeneity in the number of epidemiologically-relevant contacts and age-structured mixing have been repeatedly demonstrated as fundamental, yet are rarely combined. Networks provide a powerful and intuitive method to realise these two elements of population structure, and simulate infection dynamics. While there are a few key examples of contact networks being measured explicitly, this is not scalable to larger populations, where representative networks must be constructed from more ubiquitous individual-level data. Here, using data from social contact surveys, we develop a generalisable and robust algorithm utilizing machine learning to generate a surrogate population-scale network that preserves both age-structured mixing and heterogeneity of contacts. For different datasets and network construction assumptions we simulate the spread of infection, considering how the epidemic size varies over basic reproduction number (R0) scenarios - mirroring the process of determining public health impact from early epidemic growth. Our approach shows that both age structure and degree heterogeneity substantially reduce the epidemic size (for a given R0) compared to simpler models. We also demonstrate that these simulations more accurately re-capture the heterogeneity in secondary cases that has been observed, when transmission is scaled by contact duration to dampen the effect of highly connected nodes ("super-spreaders"). By using survey data collected during 2020-2022, these network models also inform about the impacts of control and targeting of public health interventions: quantifying the non-linear reduction in transmission opportunities that occurred during lockdowns, and the ages and contact types most responsible for onward transmission. Our robust methodology therefore allows for the inclusion of the full wealth of data commonly collected by surveys but frequently overlooked to be incorporated into more realistic transmission models of infectious diseases.

3

Estimating the strength of symptom propagation from primary-secondary case pair data

Asplin, P.; Mancy, R.; Keeling, M. J.; Hill, E. M.

2026-04-13 infectious diseases 10.64898/2026.04.07.26350037 medRxiv

Top 0.1%

22.1%

Show abstract

Symptom propagation occurs when the symptoms of secondary cases are related to those of the primary case as a result of epidemiological mechanisms. Determining whether - and to what extent - symptom propagation occurs requires data-driven methods. Here we quantify the strength of symptom propagation as the increase in risk of a secondary case developing severe symptoms if the primary case has severe symptoms. We first used synthetic results to determine the data requirements to robustly estimate the strength of symptom propagation and to investigate the effect of severity-dependent reporting bias. Categorising symptom severity into two group (mild or severe; asymptomatic or symptomatic), our estimation requires only four summary statistics - the number of primary-secondary case pairs of each combination of symptom presentations. Our analysis showed that a relatively small number (100) of synthetic primary-secondary case pairs was sufficient to obtain a reasonable estimate of the strength of symptom propagation and 1,000 pairs meant errors were consistently small across replicates. Our estimates were robust to severity-dependent reporting bias. We also explored how symptom propagation can be separated from other individual-level factors affecting severity, using age dependence as an example. Although synthetic data generated from an age-structured model led to overestimations of the strength of symptom propagation, allowing disease severity to be age-dependent restored the accuracy of parameter estimation. Finally, we applied our methodology to estimate the strength of symptom propagation from three publicly available data collected during the COVID-19 pandemic with data on presence or absence of symptoms: England households, Israel households, and Norway contact tracing. Our age-free methodology indicated a 12-17% increase in the risk of being symptomatic if infected by someone symptomatic. Our positive estimates for the strength of symptom propagation persisted when applying our age-dependent methodology to the two household data sets with age-structured information (England and Israel). These findings demonstrate evidence for symptom propagation of SARS-CoV-2 and provide consistent estimates for its strength. Our synthetic data analysis supports the conclusion that these correlations are not a result of reporting bias or age-dependent effects. This work provides a practical tool for estimating the strength of symptom propagation that has minimal data requirements, enabling application across a wide range of pathogens and epidemiological settings.

4

Using Bayesian Evidence Synthesis to estimate the number of sex workers in the United Kingdom

Long, H.; Gada, L.; Murray, L.; Laurence, T.; Hayward, A.; Finnie, T.

2026-05-26 public and global health 10.64898/2026.05.21.26353767 medRxiv

Top 0.1%

19.2%

Show abstract

Sex work is diverse and includes a broad range of people and settings. Over the last thirty years, a large proportion of public health emergencies of international concern (PHEIC) have involved infections transmitted through sexual or close contact and in sexual networks (WHO 2024). Sex workers can face increased disadvantage in relation to these public health emergencies. Given the significant health inequalities sex workers can face, they should be eligible to receive targeted and tailored health support to reduce health protection risks (Hester 2019; Jeal and Salisbury 2004a). However, they are often not explicitly eligible for targeted and tailored support due to a lack of information on incidence, prevalence of disease, and even more basic data such as reliable estimates of the number of sex workers in the UK. Accordingly, the aim of this paper is to determine a population size estimate, with uncertainty, that is more robust than those currently available. In this study, we apply Bayesian Evidence Synthesis to bring together historic estimation efforts with recent ONS National Population Estimates and Genito-Urinary Medicine Clinics Attendance Data (GUMCAD) from the UK Health Security Agency (UKHSA). A key feature of our model is the embedding of uncertainty from each input study in model priors, hence propagating it through to our final estimate. The Bayesian evidence synthesis model estimated a total of 84,000 sex workers in the United Kingdom (95% credible interval: 49,000-130,000), representing 0.121% of the current UK population.

5

Diagnostic Delays Drive Transmission in Dense Cities: Modeling the Waiting-Window Effect and Its Mitigation

Bahig, S.; Oughton, M.; Vandesompele, J.; Brukner, I.

2026-04-22 epidemiology 10.64898/2026.04.20.26350946 medRxiv

Top 0.1%

18.1%

Show abstract

In dense urban settings, delays between diagnostic sampling and effective isolation can sustain transmission during peak infectiousness. We define a waiting-window transmission externality arising when infectious individuals remain mobile while awaiting results, formalized as E = N{middle dot}P{middle dot}TR{middle dot}D, where N is daily testing volume, P test positivity, TR transmission during the waiting period, and D turnaround time. Using Monte Carlo simulation and a susceptible-infectious-recovered (SIR) framework, we quantify excess infections per 1,000 tests/day under multiple diagnostic workflows. A surge scenario incorporates positive coupling between TR and D ({rho} = 0.45), reflecting co-occurrence of laboratory saturation and elevated contacts during system stress. Under centralized 48-hour workflows, excess infections reach [~]80 at P = 10% and [~]401 at P = 50%, increasing to [~]628 under surge conditions. In contrast, near-patient rapid testing and home sampling reduce this to [~]5 and [~]25-26, respectively. Workflows that eliminate the waiting window--either through immediate isolation at sampling or through home-based PCR that returns results at the point of collection--effectively collapse the transmission term. These findings identify diagnostic delay as a modifiable driver of epidemic dynamics. Operational redesign of testing workflows, including decentralized sampling and home-based molecular diagnostics, offers a scalable pathway to improve epidemic controllability and reduce inequities in dense urban environments.

6

Sampling design and inference of the caecal-skin Campylobacter relationship in broilers

Mason, C.; Nunney, E.; Guitian, J.

2026-05-04 microbiology 10.64898/2026.05.03.722495 medRxiv

Top 0.1%

14.4%

Show abstract

The relationship between Campylobacter levels in broiler caeca and on carcass skin is central to quantitative microbial risk assessment along the poultry production chain, underpinning modelling of intervention impacts, including EFSA assessments of the public health impact of control measures. However, this relationship is typically inferred from monitoring data generated under sampling designs that do not preserve pairing between specimens and may involve pooling. In this study, we used a simulation framework to evaluate whether commonly used sampling strategies allow reliable recovery of the caecal-skin relationship. A simulated broiler population was generated, assigning caecal and skin loads to individual birds based on a specified linear relationship. Sampling was conducted under paired and unpaired designs, with and without pooling, reflecting approaches used in surveillance programmes and in policy-oriented models. Regression models were fitted to sampled data across 1,000 simulations for a range of assumed slopes. Under paired sampling, estimated slopes closely matched the true relationship across most scenarios. In contrast, unpaired sampling consistently failed to recover the association, with estimated slopes centred around zero regardless of the true slope. These findings were robust to variation in within-flock prevalence, residual error, and intercept. The results show that sampling design fundamentally affects identifiability of relationships between stages of the production chain. This has implications for interpretation of parameters derived from monitoring data and used in quantitative Campylobacter risk assessments informing policy. Parameters derived from unpaired and pooled monitoring data should therefore be interpreted with caution when used to support risk assessment and decision-making. Campylobacter; broiler chickens; sampling strategy; unpaired sampling; carcass contamination; quantitative microbial risk assessment; simulation.

7

Modeling the Impact of Exposed Cases in a Hantavirus Outbreak on a Cruise Ship

Cui, J.

2026-05-12 epidemiology 10.64898/2026.05.08.26352718 medRxiv

Top 0.1%

14.3%

Show abstract

The emergence of a hantavirus variant aboard a commercial cruise ship presents a significant public health concern. This study develops a discrete-time stochastic Susceptible-Exposed-Infectious-Recovered-Dead model to estimate transmission dynamics, hidden exposed infections, and outbreak risk among passengers and crew. Epidemiological parameters and latent disease states were inferred using an Ensemble Adjustment Kalman Filter calibrated to reported case data from WHO and ECDC situation reports. The estimated basic reproduction number was 2.76, with a 95% confidence interval of 2.52-2.99, indicating substantial potential for sustained onboard transmission before strict quarantine measures. Simulations further suggest that several exposed individuals may remain unidentified during the early outbreak phase, creating a hidden reservoir that symptom-based surveillance alone may fail to detect. These findings highlight the importance of rapid surveillance, widespread testing, targeted quarantine, and active monitoring of exposed individuals in confined travel settings. The proposed modeling framework can support timely outbreak assessment and intervention planning for infectious-disease events in similarly dense and spatially constrained populations.

8

A spatial EHR and wastewater-informed modeling framework for respiratory virus prediction under sparse and missing data conditions

Zhong, L.; Bleichrodt, A.; Pandey, A.; Kunkel, D.; Rennert, L.

2026-05-21 infectious diseases 10.64898/2026.05.18.26353485 medRxiv

Top 0.1%

14.3%

Show abstract

Wastewater-based epidemiology has emerged as a powerful complement to clinical surveillance for monitoring infectious disease dynamics. However, most existing approaches either treat wastewater sites in isolation, overlooking spatial dependencies, and often fail to account for variability in data quality, limiting their ability to generate reliable predictions of healthcare demand. Here we present a spatial Bayesian renewal framework that integrates wastewater surveillance with mobility-informed spatial interactions while incorporating reliability-weighted wastewater signals. We apply the framework to three major respiratory pathogens, i.e., SARS-CoV-2, influenza, and respiratory syncytial virus (RSV), using wastewater and hospital data from counties in South Carolina. Across rolling four-week forecasts, the spatial framework consistently outperforms non-spatial approaches and remains robust even in counties lacking direct wastewater or hospitalization observations. Importantly, we show that county-level forecasts can be translated into facility-level predictions, enabling localized assessment of healthcare demand. These forecasts provide actionable early-warning signals to support hospital capacity planning, staffing decisions, and resource allocation. Together, this work establishes a scalable digital surveillance framework that integrates heterogeneous data sources for enabling more reliable infectious disease forecasting and supporting public health decision-making in underserved and data-limited settings.

9

Incorporating Uncertainty in Study Participants' Age in Serocatalytic Models

Chen, J.; Lambe, T.; Kamau, E.; Donnelly, C.; Lambert, B.; Bajaj, S.

2026-03-16 infectious diseases 10.64898/2026.03.14.26346885 medRxiv

Top 0.1%

12.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWSerological surveys measure the presence of antibodies in a population to infer past exposure to an infectious pathogen. If study participants ages are known, serocatalytic models can be used to retrace the historical transmission strength of a pathogen within that population, quantified by the force of infection (FOI). These models rely on age information as a key variable since infection risks are interpreted in relation to how long individuals have been at risk. However, due to data constraints, participants ages may be provided only within "age bins". A common approach is then to assign individuals ages to be midpoints of their respective age bins, ignoring uncertainty in this quantity. In this study, we quantify the bias introduced by this midpoint approach and develop a Bayesian framework that explicitly accounts for uncertainty in age. By comparing inference under constant, age-dependent, and time-dependent FOI scenarios, we show that incorporating uncertainty in age in serocatalytic models yields more reliable FOI estimates without sacrificing computational complexity. These improvements support the interpretation of serological data and inform public health decisions, such as estimating disease burden and identifying targeted vaccination groups.

10

A structural Merton jump-diffusion framework for survival analysis: Modeling biological solvency and distance-to-death(DtD) in tuberculosis

Pefura-Yone, E. W.; Pefura-Yone, E. H.; Pefura-Yone, H. L. N.; Djenabou, A.; Balkissou, A. D.

2026-04-01 bioengineering 10.64898/2026.03.30.715204 medRxiv

Top 0.1%

12.7%

Show abstract

Tuberculosis (TB) remains a leading cause of death globally, with early mortality often driven by severe malnutrition and human immuno-deficiency virus (HIV) co-infection. Traditional survival analyses identify risk factors but remain associative, failing to capture the dynamic physiological collapse preceding death. In a novel interdisciplinary adaptation, we applied the Merton jump-diffusion structural framework from quantitative finance to model survival as a state of biological solvency, in which mortality occurs when a stochastic health trajectory crosses a critical failure threshold. We analysed a retrospective cohort of 15,182 TB patients in Cameroon over two decades. Adjusted body mass index (BMI) was conceptualized as a proxy for health capital and modeled using a stochastic process accounting for individual recovery trends, physiological instability, and acute clinical shocks. The study included predominantly young adult males (median age: 33 years) with a median BMI of 20.7 kg/m2. HIV co-infection was present in 35% of patients. The overall mortality rate during the 240 days follow-up period was 7.0%, with 55.1% of deaths occurring within the first 30 days. The model identified a critical failure threshold at BMI 17.329 kg/m2. HIV co-infection emerged as a key driver of metabolic instability, significantly increasing physiological volatility. Statistical validation confirmed that sudden clinical shocks were necessary to explain observed mortality patterns. The resulting Distance-to-Death (DtD) metric slightly outperformed standard associative extended Cox models in predicting survival, achieving a higher discriminative ability in testing set (Harrells C-index: 0.781 vs. 0.772; p = 0.038). Patients stratified into the highest-risk category showed a mortality rate of 16.7%, compared with 1.6% in the most stable group.This study bridges financial engineering and clinical epidemiology, offering a mechanistic understanding of how physiological reserves and metabolic instability determine survival. To support clinical application, we developed an interactive digital triage tool enabling identification of high-risk patients in resource-limited settings. Author summaryTuberculosis remains a major cause of death worldwide, particularly in people with poor nutrition or co-infection with HIV. In this study, we explored a new way to understand why some patients survive while others do not. We adapted a method originally used in finance to track the "health reserves" of patients over time, using body weight and related measures to estimate how close someone is to a critical health threshold. Our approach captures both gradual health decline and sudden medical complications, such as severe infections or rapid deterioration. By applying this method to a large group of patients in Cameroon, we found that a very low body weight is a strong warning sign for impending death and that HIV infection makes health outcomes less predictable. We also created a simple scoring tool that can help doctors identify patients at greatest risk, so that life-saving interventions and closer monitoring can be prioritized. This work bridges mathematical modeling and clinical care, offering a new way to assess patient vulnerability and improve outcomes in resource-limited settings.

11

Sample size in social contact surveys for epidemic modelling

Danon, L.; Brooks-Pollock, E.

2026-03-31 epidemiology 10.64898/2026.03.30.26349407 medRxiv

Top 0.1%

11.9%

Show abstract

Background Social contact surveys, which measure who-contacts-whom, are widely used to inform infectious disease transmission models and estimate the reproduction number (R), a key metric for assessing epidemic risk. Despite their widespread use, sample size calculations are not routinely performed. Aims To assess the impact of sample size on estimates of R and determine a practical target sample size for social contact surveys used in epidemic modelling. Methods We conducted a review of social contact surveys (2008-2025) to characterise current practice. We characterised the impact of survey size on epidemic metrics using two social contact surveys, the UK Social Contact Survey and POLYMOD (Europe) and two methods. For each dataset and approach, we generated repeated subsamples and calculated the resulting reproduction numbers, characterised their distributions and measured uncertainty. Results We identified 107 unique social contact surveys from 57 studies. Sample sizes ranged from 30 to more than 10,000 participants, with a median of 1,438. One quarter of surveys contained fewer than 1,000 participants. From our simulations, we find that sample sizes below 200 individuals can result in highly variability reproduction numbers. Increasing sample size increases precision, and the most meaningful gains are up to 1,300 individuals. Increasing sample sizes over 3,000 individuals leads to smaller gains. Conclusions A minimum sample size of approximately 1,200-1,300 participants appears sufficient for general-purpose use. These findings support the inclusion of sample size considerations in the design, reporting and interpretation of social contact surveys used for epidemic intelligence and public health decision-making.

12

A mathematical model for pertussis transmission and vaccination

Hounsell, R. A.; Norman, J.; Muloiwa, R.; Silal, S. P.

2026-03-17 infectious diseases 10.64898/2026.03.16.26348473 medRxiv

Top 0.1%

10.7%

Show abstract

Pertussis remains an endemic and periodically resurgent vaccine-preventable disease despite long-standing immunisation programmes, reflecting complex interactions between transmission, waning immunity, vaccination history, and heterogeneous clinical presentation. We present a comprehensive age-structured mathematical model of pertussis transmission that explicitly represents infection heterogeneity, immunity dynamics, and detailed vaccination schedules across the life course. The model stratifies the population into 56 age groups and 29 epidemiological states, capturing four distinct infection types that differ by severity, symptoms, and infectiousness, including asymptomatic infection. Both naturally acquired and vaccine-derived immunity are modelled as non-lifelong, incorporating waning, partial protection, reinfection, and immune boosting following exposure without transmissible infection. Vaccination is represented at high resolution, including dose-specific primary series vaccination, booster doses in early childhood, childhood, and adolescence, and maternal immunisation during pregnancy, with differentiation between whole-cell and acellular pertussis vaccine formulations and historical changes in vaccine use and coverage. Periodicity and stochasticity are incorporated to reproduce observed multi-year epidemic cycles. A global sensitivity analysis using Latin hypercube sampling and partial rank correlation coefficients identifies immunity waning rates, immune boosting, and recovery from severe infection as key drivers of modelled incidence, mortality, and population protection. By integrating detailed immune processes with realistic vaccination histories, this model provides a flexible framework for evaluating pertussis epidemiology and assessing the population-level impact of alternative vaccination strategies, including booster and maternal immunisation policies.

13

Modeling the impact of respiratory disease outbreaks on the United States agricultural workforce

Bardsley, K.; de Pablo, L. X.; Keppler Canada, E.; Ormaza Zulueta, N.; Mehrabi, Z.; Kissler, S. M.

2026-04-02 epidemiology 10.64898/2026.03.31.26349871 medRxiv

Top 0.1%

10.6%

Show abstract

Emerging respiratory disease outbreaks pose a major threat to food production systems. Agricultural workers live in larger, more crowded households than the general population, amplifying their potential exposure to respiratory pathogens, yet the consequences for worker health and food production remain poorly understood. We developed a household-structured susceptible-infectious-recovered (SIR) transmission model to compare disease dynamics between agricultural workers and the general U.S. population across six regions. We simulated outbreaks across a range of epidemiological scenarios and assessed productivity losses in California for three labor-intensive crops (oranges, iceberg lettuce, strawberries) with different harvest seasonalities. For a baseline reproduction number of R0 = 1.5, peak disease prevalence among agricultural workers was 1.23-1.45 times higher than that of the general population across regions, and final outbreak sizes were 1.15-1.28 times higher. Peak productivity losses ranged from 0.50%-0.62% across crops, translating to millions in lost revenue. At higher transmissibility and severity (R0 = 3 and assuming all infections are symptomatic), losses were over 2.5 times higher. Household crowding may lead to disproportionate respiratory disease burden among agricultural workers, highlighting the need for targeted outbreak preparedness and mitigation strategies in the agricultural sector to maintain food system resilience and support public health in these communities.

14

Development of an original algorithm to characterize serological antibody response that improve infectious diseases surveillance

RAZAFIMAHATRATRA, S. L.; RASOLOHARIMANANA, L. T.; ANDRIAMARO, T. M.; RANAIVOMANANA, P.; SCHOENHALS, M.

2026-04-24 epidemiology 10.64898/2026.04.16.26350925 medRxiv

Top 0.1%

10.5%

Show abstract

Interpreting serological data remains challenging, particularly in low-prevalence or cross-reactive contexts, where antibody responses often show substantial overlap between exposed and unexposed individuals and may depart from normal distributional assumptions. Conventional cutoff-based approaches often yield inconsistent or biased estimates of seroprevalence. Here, we present a decisional framework based on finite mixture models (FMMs) that enhances the robustness and interpretability of serological analyses. Beyond simply applying mixture models, our framework integrates multiple methodological innovations : (i) systematic comparison of Gaussian and skew-normal mixture models to accommodate asymmetric antibody distributions; (ii) rigorous model selection using the Cramer-von Mises test (p > 0.01) combined with a parsimonious score (APS) to prioritize models with well-separated clusters; and (iii) hierarchical clustering of posterior probabilities to collapse latent components into biologically meaningful seronegative and seropositive groups. Applied to chikungunya virus (CHIKV) data from Bangladesh, the framework produced prevalence estimates consistent with ROC-based methods while probabilistically identifying borderline cases. Validation on SARS-CoV-2 and dengue datasets further demonstrated its generalizability: for SARS-CoV-2, the approach identified up to five latent clusters with high sensitivity (up to 100%) and specificity (up to 100%), enabling discrimination by disease severity. For dengue, it revealed interpretable subgrouping consistent with background exposure and subclinical infection, despite limited confirmed cases. By integrating distributional flexibility, robust goodness-of-fit testing, and biologically guided cluster consolidation, this decisional FMM framework provides a reproducible and scalable method for serological interpretation across pathogens and epidemiological settings, addressing key limitations of threshold-based classification.

15

Basic Baseline model design choices can substantially influence performance in collaborative forecast hubs

Suez, E.; Fox, S. J.

2026-03-20 epidemiology 10.64898/2026.03.18.26348748 medRxiv

Top 0.1%

10.3%

Show abstract

Over the past decade, outbreak forecasting has become an increasingly used tool to assist public health decision-making during epidemics. Collaborative forecast hubs, where multiple teams submit predictions in real-time, are the gold standard for such efforts. For each hub, a Baseline model is used as a performance benchmark for other models. Although the Baseline is understood as a naive forecast, its design is subjective, and the impact of model design decisions remains understudied. We evaluated how three Baseline specification decisions influence forecast performance on trend models that forecast based on historically observed dynamics: (1) the amount of historical data used for training, (2) whether the data are transformed, and (3) whether forecasts follow a flatline variant (constant predictions) or a drift variant (allowing a slope). Retrospective forecasts were generated for multiple years across four surveillance targets: COVID-19, influenza and RSV hospital admissions, and weighted influenza-like illness percentage. For wILI, we additionally compared trend baselines with a seasonal baseline model leveraging long-term historical patterns. Model specification significantly altered performance. The optimal performing model across targets was a flatline model that used the most recent 6-12 transformed observations. The optimal model outperforms the current standard Baseline used in many forecast hubs by an average of 9.6% (range: 3.7-12.9%) across forecast targets, and it outperformed the seasonal baseline model by 32.3% across nine influenza seasons. Our results demonstrate that subjective Baseline design decisions can materially influence forecast accuracy and, consequently, the perceived rankings of models within collaborative forecast hubs. Based on the varying approaches and their performance differences, these findings highlight the need for increased transparency in Baseline model specifications and support the routine inclusion of multiple benchmark models within collaborative forecast hubs.

16

Federated analysis of incubation period distributions using individual-level observed data and heterogeneous summary statistics

Morgenstern, C.; Khurana, M. P.; Naidoo, T.; Rawson, T.; Cori, A.; Duchene, D. A.; Ferguson, N. M.; Kraemer, M. U. G.; Bhatt, S.

2026-06-02 epidemiology 10.64898/2026.06.01.26354607 medRxiv

Top 0.1%

10.1%

Show abstract

The incubation period, the interval between pathogen exposure and symptom onset, is a critical epidemiological parameter for follow-up policy and outbreak response, yet individual-level exposure data remain scarce, especially early in outbreaks. For most priority pathogens, only summary statistics are available because sharing of individual-level data can be sensitive. Here we introduce a Bayesian hierarchical framework that jointly models individual-level observations and published summary statistics under a unified federated analysis framework. Simulation studies demonstrate that the method accurately recovers incubation period distributions across a range of data availability scenarios, generally outperforming approaches that use published summary statistics alone. Applying the framework to 18 pathogens, including 10 priority pathogens classified to have outbreak potential by the World Health Organization, we find substantial between-study heterogeneity in incubation period estimates, including by outbreak country for SARS-CoV-1, variants of concern for COVID-19, and exposure setting for typhoid fever. These estimates, together with the curated dataset and modelling framework in our associated R package ddsynth, provide a reproducible foundation for improved incubation period estimation and synthesis across pathogens of epidemic concern. Our framework enables robust and rapid estimation of incubation periods during new outbreaks.

17

A Multi-Clique Network Model for Epidemic Spread with Fully Accessible Within-Group and Limited Between-Group Contacts

Smah, M. L.; Seale, A. C.; Rock, K. S.

2026-04-11 infectious diseases 10.64898/2026.04.08.26350390 medRxiv

Top 0.1%

10.0%

Show abstract

Network-based epidemic models have been instrumental in understanding how contact structure shapes infectious disease dynamics, yet widely used frameworks such as Erd[o]s-Renyi, configuration-model, and stochastic block networks do not explicitly capture the combination of fully accessible (saturated) within-group interactions and constrained between-group connectivity characteristic of many real-world settings. Here, we introduce the Multi-Clique (MC) network model, a generative framework in which individuals are organised into fully connected cliques representing stable contact groups (e.g., households, classrooms, or workplaces), with a limited number of external connections governing inter-group transmission. Using stochastic susceptible-infectious-recovered (SIR) simulations on degree-matched networks, we compare epidemic dynamics on MC networks with those on classical random graph models. Despite having an identical mean degree, MC networks exhibit systematically distinct behaviour, including slower epidemic growth, reduced peak prevalence, increased fade-out probability, and delayed time to peak. These effects arise from rapid within but constrained between clique transmission, creating structural bottlenecks that standard models do not capture. The MC framework provides an interpretable, data-driven representation of recurrent contact structure, with parameters that map directly to observable quantities such as household and classroom sizes. By isolating the role of intergroup connectivity, the model offers a basis for evaluating targeted intervention strategies that reduce between-group mixing while preserving within-group interactions. Our results highlight the importance of explicitly representing the real-life clique-based network structure in epidemic models and suggest that classical degree-matched networks may systematically overestimate epidemic speed and intensity in structured populations.

18

Predicting COVID-19 incidence from seroprevalence and population-based cohort data using interpretable machine learning with differential privacy analysis

Krepel, J.; Binkyte, R.; Kerkouche, R.; Harries, M.; Klett-Tammen, C. J.; Fritz, M.; Kesselheim, S.; Kuehn, M.; Bazarova, A.; Lange, B.

2026-04-02 epidemiology 10.64898/2026.04.01.26349876 medRxiv

Top 0.1%

9.9%

Show abstract

During the COVID-19 pandemic, reported incidence data played a central role in public health surveillance and in tracking epidemic dynamics, although they provide limited insight into the behavioral, immunological, and socioeconomic drivers of transmission.Population-based seroprevalence studies with linked survey data offer a rich but untapped source of individual-level information that can complement routine surveillance. In this study, we investigate whether aggregated seroprevalence cohort data can be leveraged to predict local COVID-19 incidence and to identify interpretable predictors associated with transmission dynamics. Using data from the Multilocal SeroPrevalence (MuSPAD) study in Germany (2020--2022), we trained multiple machine learning models, including least absolute shrinkage and selection operator (LASSO), vector autoregressive models (VAR), multilayer perceptrons (MLPs), and long short-term memory neural networks (LSTMs), to predict location-specific seven-day incidence rates. Feature importance was assessed using regression coefficients where applicable and model-agnostic explainability methods, including Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). Across model classes, cohort-derived features enabled accurate prediction of local incidence, with time-aware models achieving the strongest performance. Consistent predictors included prior infection and testing history, employment-related changes, vaccination status, and mask-wearing behavior, highlighting the importance of behavioral and reporting-related signals. While differential privacy introduced modest degradation in predictive performance under strict privacy budgets, SHAP-based explanations remained stable, and LIME-based explanations were more sensitive to privacy-induced noise. These results demonstrate that aggregated cohort data encode meaningful and interpretable signals of population-level transmission dynamics. Population-based serosurveys therefore provide a complementary source of information for predicting local COVID-19 incidence and identifying key drivers of transmission beyond routine surveillance data. Our findings show that integrating interpretable machine learning with privacy-aware analysis enables actionable insights from sensitive cohort data, supporting their use in digital epidemiology and informing data-driven public health decision-making.

19

Modeling the impact of adherence to U.S. isolation and masking guidance on SARS-CoV-2 transmission in office workplaces in 2021-2022

Garcia Quesada, M.; Wallrafen-Sam, K.; Kiti, M. C.; Ahmed, F.; Aguolu, O. G.; Ahmed, N.; Omer, S. B.; Lopman, B. A.; Jenness, S. M.

2026-04-21 epidemiology 10.64898/2026.04.14.26350639 medRxiv

Top 0.1%

9.8%

Show abstract

Non-pharmaceutical interventions (NPIs) have been important for controlling SARS-CoV-2 transmission, particularly before and during initial vaccine rollout. During the pandemic, the US Centers for Disease Control and Prevention issued isolation and masking guidance in case of COVID-19-like illness, a positive SARS-CoV-2 test, or known exposure to SARS-CoV-2. However, the impact of this guidance on mitigating transmission in office workplaces is unclear. We used a network-based mathematical model to estimate the impact of this guidance on SARS-CoV-2 transmission among office workers and their communities. The model represented social contacts in the home, office, and community. We used data from the CorporateMix study to parametrize social contacts among office workers and calibrated the model to represent the COVID-19 epidemic in Georgia, USA from January 2021 through August 2022. In the reference scenario (58% adherence to guidance among office workers and the broader population), workplace transmission accounted for a small fraction of total infections. Reducing adherence among office workers to 0% increased workplace transmissions by 27.1% and increasing adherence to 75% reduced workplace transmission by 7.0%. Increasing adherence to 75% among office workers had minimal impact on symptomatic cases and deaths; increasing it among the broader population was more effective in reducing office worker cases and deaths. In our model, moderate adherence to recommended NPIs in workplaces was effective in reducing transmission, but increasing adherence had limited benefit given workplaces that have low contact intensity and hybrid work arrangements. These results underscore the public health benefits of community-wide adoption of recommended NPIs.

20

Two anti-phase spatial modes and a candidate spatial-persistence regime transition of SARS-CoV-2 in Japan: a 159-week prefecture-level sentinel surveillance study

Nakano, T.; Onozuka, D.; Ikeda, Y.; Washiyama, K.; Takashima, Y.

2026-05-26 epidemiology 10.64898/2026.05.24.26353972 medRxiv

Top 0.1%

9.8%

Show abstract

Background. On 8 May 2023 the Japanese Ministry of Health, Labour and Welfare reclassified COVID-19 under the Infectious Disease Control Law from a designated infectious disease (with case-by-case reporting requirements comparable to those of a Category-2 disease) to a Category-5 ("Class-5") notifiable disease, joining the same category as seasonal influenza and most other endemic respiratory infections. Under this regime, COVID-19 case counts are reported weekly from a nationwide network of sentinel medical facilities (initially approximately 5,000, reduced to approximately 3,000 following an April 2025 surveillance reform), and individual case reporting is no longer required. We aimed to characterize the spatial topology of COVID-19 epidemics under this sentinel-surveillance regime and to detect, in a data-driven manner, any structural change in epidemic dynamics over this period. Methods. We analyzed weekly per-sentinel-facility COVID-19 case counts in all 47 prefectures of Japan from 2023-W17 to 2026-W19 (159 weeks). For each week we computed the Shannon pseudo-entropy S of the prefecture-share distribution and global, local, and time-lagged Moran's I across a 92-edge contiguity-based adjacency matrix. To identify any structural change in a data-driven manner, we adopted a two-stage approach motivated by an empirical regularity established in Section 3: we first verified the wave-amplitude-invariant entropy ceiling (S_max >= 3.80 in all five pre-transition waves), then restricted change-point detection to the weeks after S(t) last attained this ceiling, applying PELT, CUSUM, and Bai-Perron sup-F within this restricted region. Seasonal structure was characterized by truncated Fourier regression with first-order autoregressive errors (Cochrane-Orcutt) over harmonic orders K = 1 to 6; between-period comparisons used moving block bootstrap as the principal inferential statistic. Results. The five epidemic waves during 2023-2025 followed a stereotyped spatial template in which S(t) traced a characteristic U-shape around each peak, with a wave-amplitude-invariant entropy ceiling reaching on average 99.4% of the theoretical maximum ln 47 (range 3.820-3.836, SD 0.006). The last week in which S(t) attained this entropy ceiling was 2025-W42. Restricting change-point detection to the 29 subsequent weeks, PELT and CUSUM localised the structural break to late 2025: PELT identified 2025-W48 (robust across penalty values >= sigma^2*ln(n) and across entropy-ceiling thresholds 3.78-3.82) and CUSUM peaked at 2025-W50 (p < 0.0001), placing the break within a two-week window centred on late November 2025. Bai-Perron sup-F peaked later at 2026-W02 (p = 0.062, with reduced power on n = 29). We adopted 2025-W48 as the principal change-point, defining 135 pre-transition weeks and 24 post-transition weeks. Two anti-phase spatial modes were identified in the pre-transition record: a summer-onset Okinawa-seeded Kyushu cascade (Mode A; annual peak epi week 26) and a winter-onset Tohoku-centred connected-cluster mode (Mode B; annual peak epi week 51), approximately 25 epi weeks out of phase. After the regime transition, this ceiling was not attained, and the spatial-persistence ratio I(tau = 8 wk)/I(0) shifted from a highly variable distribution centred near 0.27 (pre-transition, 125 weeks) to a tightly clustered distribution around 0.89 (post-transition, 24 weeks); the mean difference was 0.62 (95% bootstrap CI 0.32 to 0.90; moving block bootstrap p < 0.0001 across block lengths 1-12). The principal finding remained significant under autoregressive-augmented null models and was robust to adjacency-matrix choice, the April 2025 surveillance reform, harmonic order K = 1 to 6, and Okinawa exclusion. Conclusions. Data-driven analysis of 159 weeks of Japanese sentinel surveillance identifies a candidate spatial-persistence regime transition emerging in late November 2025, in which the spatial structure of weekly case shares persists for at least 8 weeks rather than dissipating as in pre-transition. The transition coincides with loss of the wave-amplitude-invariant entropy ceiling and with absence of the Mode A signature through the observed post-transition period. The recent uptick in Okinawa case shares (continuing through 2026-W19) leaves open whether the Mode A signature is structurally suppressed or merely deferred; observation through summer 2026 is required to distinguish a sustained shift from a transient anomaly.