Epidemics — Latest Matching Preprints

1

Using Bayesian Evidence Synthesis to estimate the number of sex workers in the United Kingdom

Long, H.; Gada, L.; Murray, L.; Laurence, T.; Hayward, A.; Finnie, T.

2026-05-26 public and global health 10.64898/2026.05.21.26353767 medRxiv

Top 0.1%

19.2%

Show abstract

Sex work is diverse and includes a broad range of people and settings. Over the last thirty years, a large proportion of public health emergencies of international concern (PHEIC) have involved infections transmitted through sexual or close contact and in sexual networks (WHO 2024). Sex workers can face increased disadvantage in relation to these public health emergencies. Given the significant health inequalities sex workers can face, they should be eligible to receive targeted and tailored health support to reduce health protection risks (Hester 2019; Jeal and Salisbury 2004a). However, they are often not explicitly eligible for targeted and tailored support due to a lack of information on incidence, prevalence of disease, and even more basic data such as reliable estimates of the number of sex workers in the UK. Accordingly, the aim of this paper is to determine a population size estimate, with uncertainty, that is more robust than those currently available. In this study, we apply Bayesian Evidence Synthesis to bring together historic estimation efforts with recent ONS National Population Estimates and Genito-Urinary Medicine Clinics Attendance Data (GUMCAD) from the UK Health Security Agency (UKHSA). A key feature of our model is the embedding of uncertainty from each input study in model priors, hence propagating it through to our final estimate. The Bayesian evidence synthesis model estimated a total of 84,000 sex workers in the United Kingdom (95% credible interval: 49,000-130,000), representing 0.121% of the current UK population.

2

Modeling the Impact of Exposed Cases in a Hantavirus Outbreak on a Cruise Ship

Cui, J.

2026-05-12 epidemiology 10.64898/2026.05.08.26352718 medRxiv

Top 0.1%

14.3%

Show abstract

The emergence of a hantavirus variant aboard a commercial cruise ship presents a significant public health concern. This study develops a discrete-time stochastic Susceptible-Exposed-Infectious-Recovered-Dead model to estimate transmission dynamics, hidden exposed infections, and outbreak risk among passengers and crew. Epidemiological parameters and latent disease states were inferred using an Ensemble Adjustment Kalman Filter calibrated to reported case data from WHO and ECDC situation reports. The estimated basic reproduction number was 2.76, with a 95% confidence interval of 2.52-2.99, indicating substantial potential for sustained onboard transmission before strict quarantine measures. Simulations further suggest that several exposed individuals may remain unidentified during the early outbreak phase, creating a hidden reservoir that symptom-based surveillance alone may fail to detect. These findings highlight the importance of rapid surveillance, widespread testing, targeted quarantine, and active monitoring of exposed individuals in confined travel settings. The proposed modeling framework can support timely outbreak assessment and intervention planning for infectious-disease events in similarly dense and spatially constrained populations.

3

A spatial EHR and wastewater-informed modeling framework for respiratory virus prediction under sparse and missing data conditions

Zhong, L.; Bleichrodt, A.; Pandey, A.; Kunkel, D.; Rennert, L.

2026-05-21 infectious diseases 10.64898/2026.05.18.26353485 medRxiv

Top 0.1%

14.3%

Show abstract

Wastewater-based epidemiology has emerged as a powerful complement to clinical surveillance for monitoring infectious disease dynamics. However, most existing approaches either treat wastewater sites in isolation, overlooking spatial dependencies, and often fail to account for variability in data quality, limiting their ability to generate reliable predictions of healthcare demand. Here we present a spatial Bayesian renewal framework that integrates wastewater surveillance with mobility-informed spatial interactions while incorporating reliability-weighted wastewater signals. We apply the framework to three major respiratory pathogens, i.e., SARS-CoV-2, influenza, and respiratory syncytial virus (RSV), using wastewater and hospital data from counties in South Carolina. Across rolling four-week forecasts, the spatial framework consistently outperforms non-spatial approaches and remains robust even in counties lacking direct wastewater or hospitalization observations. Importantly, we show that county-level forecasts can be translated into facility-level predictions, enabling localized assessment of healthcare demand. These forecasts provide actionable early-warning signals to support hospital capacity planning, staffing decisions, and resource allocation. Together, this work establishes a scalable digital surveillance framework that integrates heterogeneous data sources for enabling more reliable infectious disease forecasting and supporting public health decision-making in underserved and data-limited settings.

4

Federated analysis of incubation period distributions using individual-level observed data and heterogeneous summary statistics

Morgenstern, C.; Khurana, M. P.; Naidoo, T.; Rawson, T.; Cori, A.; Duchene, D. A.; Ferguson, N. M.; Kraemer, M. U. G.; Bhatt, S.

2026-06-02 epidemiology 10.64898/2026.06.01.26354607 medRxiv

Top 0.1%

10.1%

Show abstract

The incubation period, the interval between pathogen exposure and symptom onset, is a critical epidemiological parameter for follow-up policy and outbreak response, yet individual-level exposure data remain scarce, especially early in outbreaks. For most priority pathogens, only summary statistics are available because sharing of individual-level data can be sensitive. Here we introduce a Bayesian hierarchical framework that jointly models individual-level observations and published summary statistics under a unified federated analysis framework. Simulation studies demonstrate that the method accurately recovers incubation period distributions across a range of data availability scenarios, generally outperforming approaches that use published summary statistics alone. Applying the framework to 18 pathogens, including 10 priority pathogens classified to have outbreak potential by the World Health Organization, we find substantial between-study heterogeneity in incubation period estimates, including by outbreak country for SARS-CoV-1, variants of concern for COVID-19, and exposure setting for typhoid fever. These estimates, together with the curated dataset and modelling framework in our associated R package ddsynth, provide a reproducible foundation for improved incubation period estimation and synthesis across pathogens of epidemic concern. Our framework enables robust and rapid estimation of incubation periods during new outbreaks.

5

Two anti-phase spatial modes and a candidate spatial-persistence regime transition of SARS-CoV-2 in Japan: a 159-week prefecture-level sentinel surveillance study

Nakano, T.; Onozuka, D.; Ikeda, Y.; Washiyama, K.; Takashima, Y.

2026-05-26 epidemiology 10.64898/2026.05.24.26353972 medRxiv

Top 0.1%

9.8%

Show abstract

Background. On 8 May 2023 the Japanese Ministry of Health, Labour and Welfare reclassified COVID-19 under the Infectious Disease Control Law from a designated infectious disease (with case-by-case reporting requirements comparable to those of a Category-2 disease) to a Category-5 ("Class-5") notifiable disease, joining the same category as seasonal influenza and most other endemic respiratory infections. Under this regime, COVID-19 case counts are reported weekly from a nationwide network of sentinel medical facilities (initially approximately 5,000, reduced to approximately 3,000 following an April 2025 surveillance reform), and individual case reporting is no longer required. We aimed to characterize the spatial topology of COVID-19 epidemics under this sentinel-surveillance regime and to detect, in a data-driven manner, any structural change in epidemic dynamics over this period. Methods. We analyzed weekly per-sentinel-facility COVID-19 case counts in all 47 prefectures of Japan from 2023-W17 to 2026-W19 (159 weeks). For each week we computed the Shannon pseudo-entropy S of the prefecture-share distribution and global, local, and time-lagged Moran's I across a 92-edge contiguity-based adjacency matrix. To identify any structural change in a data-driven manner, we adopted a two-stage approach motivated by an empirical regularity established in Section 3: we first verified the wave-amplitude-invariant entropy ceiling (S_max >= 3.80 in all five pre-transition waves), then restricted change-point detection to the weeks after S(t) last attained this ceiling, applying PELT, CUSUM, and Bai-Perron sup-F within this restricted region. Seasonal structure was characterized by truncated Fourier regression with first-order autoregressive errors (Cochrane-Orcutt) over harmonic orders K = 1 to 6; between-period comparisons used moving block bootstrap as the principal inferential statistic. Results. The five epidemic waves during 2023-2025 followed a stereotyped spatial template in which S(t) traced a characteristic U-shape around each peak, with a wave-amplitude-invariant entropy ceiling reaching on average 99.4% of the theoretical maximum ln 47 (range 3.820-3.836, SD 0.006). The last week in which S(t) attained this entropy ceiling was 2025-W42. Restricting change-point detection to the 29 subsequent weeks, PELT and CUSUM localised the structural break to late 2025: PELT identified 2025-W48 (robust across penalty values >= sigma^2*ln(n) and across entropy-ceiling thresholds 3.78-3.82) and CUSUM peaked at 2025-W50 (p < 0.0001), placing the break within a two-week window centred on late November 2025. Bai-Perron sup-F peaked later at 2026-W02 (p = 0.062, with reduced power on n = 29). We adopted 2025-W48 as the principal change-point, defining 135 pre-transition weeks and 24 post-transition weeks. Two anti-phase spatial modes were identified in the pre-transition record: a summer-onset Okinawa-seeded Kyushu cascade (Mode A; annual peak epi week 26) and a winter-onset Tohoku-centred connected-cluster mode (Mode B; annual peak epi week 51), approximately 25 epi weeks out of phase. After the regime transition, this ceiling was not attained, and the spatial-persistence ratio I(tau = 8 wk)/I(0) shifted from a highly variable distribution centred near 0.27 (pre-transition, 125 weeks) to a tightly clustered distribution around 0.89 (post-transition, 24 weeks); the mean difference was 0.62 (95% bootstrap CI 0.32 to 0.90; moving block bootstrap p < 0.0001 across block lengths 1-12). The principal finding remained significant under autoregressive-augmented null models and was robust to adjacency-matrix choice, the April 2025 surveillance reform, harmonic order K = 1 to 6, and Okinawa exclusion. Conclusions. Data-driven analysis of 159 weeks of Japanese sentinel surveillance identifies a candidate spatial-persistence regime transition emerging in late November 2025, in which the spatial structure of weekly case shares persists for at least 8 weeks rather than dissipating as in pre-transition. The transition coincides with loss of the wave-amplitude-invariant entropy ceiling and with absence of the Mode A signature through the observed post-transition period. The recent uptick in Okinawa case shares (continuing through 2026-W19) leaves open whether the Mode A signature is structurally suppressed or merely deferred; observation through summer 2026 is required to distinguish a sustained shift from a transient anomaly.

6

KESOZI Digital Twin: Physics-Informed Neural Network for Independent Estimation and Prediction of Childhood Diarrheal Disease Burden in Kenya, Somaliland, and Zimbabwe

KESOZI Digital Twin, ; Agumba, J. O.; Namusonge, L.; Ogendo, J.; Hassan, M. A.; Pembere, A.; Takavarasha, M.

2026-06-04 epidemiology 10.64898/2026.06.03.26354823 medRxiv

Top 0.1%

9.3%

Show abstract

Childhood diarrheal disease remains a leading cause of morbidity and mortality among children under five years in sub-Saharan Africa, particularly in settings affected by inadequate sanitation, climate variability, malnutrition, and limited healthcare access. Conventional forecasting approaches are often constrained by sparse surveillance data, weak spatial representation, and limited incorporation of mechanistic disease dynamics. This study presents a Physics-Informed Multimodal Artificial Intelligence Digital Twin framework that integrates Physics-Informed Neural Networks, Graph Neural Networks, diffusion-reaction epidemiological modeling, multimodal fusion learning, and Digital Twin simulation to estimate and predict childhood diarrheal disease burden in Kenya, Somaliland, and Zimbabwe. Using public epidemiological, environmental, climate, sanitation, and synthetic proof-of-concept datasets, the framework modeled temporal disease dynamics, spatial transmission, pathogen-attributed burden, and outbreak trajectories while enforcing epidemiological consistency through physics-informed optimization. Results demonstrated robust forecasting performance, enhanced spatial transmission modeling, uncertainty-aware predictions, and realistic outbreak simulations across the three countries. Rotavirus, Shigella, and Cryptosporidium were identified as major contributors to modeled mortality burden, while unsafe water exposure, poor sanitation, malnutrition, and climate-sensitive transmission substantially increased disease risk. Compared with a Bayesian baseline model, the multimodal framework achieved superior nonlinear risk characterization, geospatial learning, and temporal prediction. These findings highlight the potential of scientific machine learning and digital twin systems for infectious disease surveillance, outbreak forecasting, climate-health analytics, and evidence-based public health decision-making in low-resource African settings. Keywords: Physics-Informed Neural Networks, Graph Neural Networks, Digital Twin, Childhood Diarrheal Disease, Epidemiology, Kenya, Somaliland, Zimbabwe, Scientific Machine Learning, Spatial Epidemiology, Multimodal Fusion

7

A Decade of the Center for Disease Control and Prevention's FluSight Influenza Forecasting

Hines, A. G.; Mathis, S. M.; Johansson, M. A.; Biggerstaff, M.; Reed, C.; Borchering, R.

2026-06-08 epidemiology 10.64898/2026.06.05.26354941 medRxiv

Top 0.1%

9.0%

Show abstract

Since the U.S. 2013/14 influenza season, the CDC's FluSight Challenge has provided a platform for evaluating influenza forecasting models and fostering collaboration across institutions. The Challenge aims to improve the science and enhance the utility of infectious disease forecasts for public health decision making. We analyzed ten years of submitted forecasts (2014/15-2019/20 (influenza-like illness seasons) and 2021/22-2024/25 (hospital admissions seasons)) across a range of model types, including statistical, mechanistic, machine learning, and hybrid models. Influenza-like illness (ILI) forecasts were evaluated using the exponentiated logarithmic score (skill metric) while hospital admissions forecasts were evaluated using the log transformed relative Weighted Interval Score. Corresponding potential performance differences were assessed using Wilcoxon rank-sum tests, and associations with team participation history were evaluated using Spearman's rank correlation. Model performance varied by season, and no single model type consistently outperformed others. In ILI seasons, statistical models generally performed better than mechanistic and machine learning models, though consistent differences were not observed in more recent hospital admissions seasons. Ensemble forecasts showed better overall performance across seasons, and the CDC's FluSight ensemble ranked among the top-performing forecasts every year. We also found a positive correlation between forecast accuracy and the number of years a team participated in the Challenge, with statistically significant associations in four seasons. These findings highlight the benefits of ensemble approaches and sustained engagement in improving forecasting performance, while also underscoring the continued value of forecast evaluation before and following the COVID-19 pandemic. Insights from the FluSight Challenge can guide future infectious disease forecasting efforts and support more effective public health preparedness.

8

Winter forecasting of respiratory viruses in Victoria Australia

Henderson, A. S.; Moss, R.; Adekunle, A. I.; Ye, H.; O'Hara-Wild, M.; Eales, O.; Senior, K. L.; Tobin, R.; Windecker, S. M.; golding, N.; Robinson, E.; Strachan, J.; Hyndman, R. J.; Dawson, P.; McCaw, J.; McBryde, E.; Shearer, F. M.

2026-05-21 epidemiology 10.64898/2026.05.18.26353544 medRxiv

Top 0.1%

8.5%

Show abstract

Temperate regions of the world, such as southern Australia, often experience increased health burden from respiratory pathogens during winter. The ability to forecast short-term trends in cases of these pathogens is of significant interest to public health. Across the 2024 southern hemisphere winter period, the Australia--Aotearoa Consortium for Epidemic Forecasting and Analytics (ACEFA) ran a pilot respiratory virus forecasting initiative in collaboration with the Victorian Department of Health. Each week from the 9th of May 2024 through to 12th September 2024, the consortium solicited 28-day forecasts of daily case incidence for influenza, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and respiratory syncytial virus (RSV) from multiple research groups. Four component model forecasts were contributed by three different research groups, with a fourth group utilising the component forecasts to generate ensemble forecasts (making a total of six models, four component models and two ensembles). Here we statistically evaluated the performance of each forecast and a baseline model against the observed case data. The two ensemble models were found to be frequently the top performing models. All models performed worse than the baseline model around the epidemic peaks for each pathogen.

9

The AFRIDIARRHEA multimodal fusion framework for Estimating the Burden of Diarrheal Diseases Among Children Under Five in Kenya, Zimbabwe, and Somaliland

Agumba, J. O.; Namusonge, L.; AFRIDIARRHEA CONSORTIUM, ; Ogendo, J. O.; Hassan, M. A.; Waswa, L. M.; Takavarasha, M.; Shisanya, M. S.

2026-06-02 epidemiology 10.64898/2026.06.01.26354632 medRxiv

Top 0.1%

8.5%

Show abstract

Background: Accurate estimation of childhood diarrheal disease burden in Africa remains challenging because of limited surveillance, incomplete mortality data, pathogen-attribution uncertainty, and complex environmental and socioeconomic drivers. This study developed the African Diarrheal Disease Integrated Risk Intelligence and Burden Estimation Architecture (AFRIDIARRHEA), a multimodal fusion framework for estimating under-five diarrheal burden in resource-constrained settings. Methods: AFRIDIARRHEA integrates Bayesian epidemiological modeling, machine learning, temporal forecasting, geospatial analytics, pathogen attribution, environmental intelligence, and uncertainty quantification within a unified framework. Synthetic datasets representing Kenya, Zimbabwe, and Somaliland were used to evaluate mortality, morbidity, hospitalization burden, pathogen-attributed mortality, and predictive performance. Results: The framework identified substantial heterogeneity in disease burden across countries, with Zimbabwe exhibiting the highest modeled mortality and morbidity burden and Somaliland the highest hospitalization burden. Rotavirus and Shigella were the dominant contributors to pathogen-attributed mortality. The multimodal fusion model outperformed the Bayesian baseline and individual component models, achieving improved predictive accuracy, robust uncertainty calibration, and strong agreement with benchmark estimates. Conclusions: AFRIDIARRHEA demonstrates the potential of multimodal fusion modeling for integrated estimation of childhood diarrheal burden, pathogen attribution, and uncertainty in African settings. The framework provides a scalable, transparent, and policy-relevant approach for supporting vaccine prioritization, WASH investments, outbreak preparedness, and child survival programs in data-limited environments. Keywords: Diarrheal disease, burden estimation, multimodal fusion, pathogen attribution, machine learning, uncertainty quantification, Africa

10

The cost-effectiveness of testing and quarantine strategies to contain epidemic spread during the Hajj pilgrimage: A modelling study

Wardle, J.; Cori, A.; Hauck, K.; Nouvellet, P.; Bhatia, S.

2026-06-02 epidemiology 10.64898/2026.06.01.26354577 medRxiv

Top 0.1%

8.3%

Show abstract

The Hajj is an annual pilgrimage made by millions of Muslims to Mecca in the Kingdom of Saudi Arabia (KSA). The large number of international attendees at the Hajj increases the risk of global infectious disease spread. However, we know very little about the benefits, costs, and cost-effectiveness of testing and quarantining strategies to contain epidemic spread during mass gathering events. In this work we developed a stochastic discrete-time compartmental metapopulation model to simulate international epidemics of infectious pathogens and their potential importation into KSA during the Hajj. We used the model and an epidemic simulation study to evaluate the impact and cost-effectiveness of three testing and quarantining strategies for arriving pilgrims: randomly testing 99% of pilgrims, 80% of pilgrims, or using a symptom-based screening strategy. The simulations lasted 100 days, covering the 30 days before the Hajj and 65 days after the Hajj. Under the conditions assumed in our simulation study, there was strong evidence that testing and quarantining strategies are cost-effective measures for controlling epidemic threats at the Hajj. The median net monetary benefits of intervention strategies ranged from Intl$-41.89M [95% quantile range Intl$-42.37M to Intl$3.18B] to Intl$12.68B [Intl$-8.70B to Intl$13.82B] across scenarios with different pathogen characteristics (based on the natural histories of SARS-CoV-2 and H1N1 Influenza) and epidemic seed locations. Our results were sensitive to the data sources that were used to estimate the number of pilgrims travelling to KSA by origin country, with flight passenger statistics providing biased estimates of pilgrim numbers. Our work provides an adaptable tool to inform infectious disease risk assessments and evaluate the cost-effectiveness of possible disease control measures for the Hajj, and could be extended to other mass gathering events.

11

Limitations of cross-border containment strategies for Bundibugyo ebolavirus

Middleton, C.; Larremore, D.

2026-06-08 epidemiology 10.64898/2026.06.04.26354820 medRxiv

Top 0.1%

8.1%

Show abstract

An ongoing outbreak of Bundibugyo virus disease (BVD) in the Democratic Republic of the Congo was deemed a public health emergency of international concern in May 2026. To prevent cross-border importation, many countries, including the United States, Canada, India, Thailand, and Kenya have already proposed containment strategies, and others are likely to follow suit. How well (or poorly) are screening and quarantine containment measures are likely to work? We leverage established epidemiological theory and develop a mathematical model of traveler screening and post-arrival quarantine for BVD to answer this question. We find that traveler screening via symptom screening or molecular testing will miss the majority of infected travelers, and should be complemented by post-arrival quarantine and monitoring of sufficient duration to detect those with long incubation periods. Our findings underscore the limitations of border screening and the importance of complementary measures like post-arrival quarantine to prevent local importation of BVD.

12

Disentangling infectiousness and susceptibility by age group using transmission pair data: a study of SARS-CoV-2 household transmission

Leung, K. Y.; Miura, F.; Backer, J. A.

2026-06-05 epidemiology 10.64898/2026.06.04.26354892 medRxiv

Top 0.1%

8.0%

Show abstract

Background Differential contributions to transmission across age groups have been reported for many respiratory infections, including SARS-CoV-2. They are crucial for estimating the impact of age-specific interventions. Disentangling these age-dependent contributions remains challenging, as they may reflect differences in contact rates, biological susceptibility, or infectiousness. Aim We aim to jointly estimate age-specific per-contact infectiousness and susceptibility and their effect on the impact of age-specific interventions. Methods The age-specific infectiousness and susceptibility were jointly estimated in a Bayesian framework by combining contact data with transmission pair data (who-infected-whom). We applied this approach to 197,840 self-reported household transmission pairs collected in the Netherlands during the COVID-19 pandemic. Using these estimates, we projected the expected impact of school closure and work-from-home measures during the early stages of an epidemic in the absence of other interventions. Results Both infectiousness and susceptibility to SARS-CoV-2 infection were lowest in children aged 0-9 years and highest in adults over 30 years old, with 2- to 4.5-fold differences between these groups. Projected impacts of age-specific interventions indicated that school closures would reduce the reproduction number by 8% or 29% when age-specific susceptibility and infectiousness were or were not considered, respectively. Conversely, working-from-home policies would lead to reductions of 41% with and 20% without age-specific infectiousness and susceptibility. Conclusion Our method enables robust estimation of age-specific infectiousness and susceptibility. Accounting for these age heterogeneities is essential for projecting the impact of age-targeted interventions. Our approach is adaptable to other respiratory infections and can guide more tailored public health responses.

13

A Bayesian modelling framework for inference of latent infection risk patterns from virus neutralisation assay titration data

Alrefae, T. A.; Pons-Salort, M.; Donnelly, C. A.; Lambert, B.; Kamau, E.

2026-05-21 bioinformatics 10.64898/2026.05.18.726027 medRxiv

Top 0.1%

7.4%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWSerological assays remain the standard experimental approach for estimating the cumulative incidence of a pathogen and monitoring population immunity. The predominant approach for analysing serum titration data from virus neutralisation assays uses a nearly century-old interpolation-based method which neglects inherent imperfections in the assay and produces estimates with no measure of uncertainty. We introduce a two-part Bayesian modelling framework to estimate the underlying antibody concentrations in the raw serum samples taken from serosurveyed individuals, to improve the interpretation of serological data over age. First, we develop a mechanistic Bayesian model for serum antibody titration data that estimates latent antibody concentrations while accounting for assay variability and quantifying uncertainty. Second, we propagate this uncertainty into an age-structured serocatalytic model by integrating over posterior draws of individual antibody concentrations, allowing joint inference on latent serostate membership, force of infection, and serological waning rate. We use this framework to explore the dynamics of infection and immunity for three enterovirus serotypes: enteroviruses A71 (EV-A71) and D68 (EV-D68) and coxsackievirus A6 (CVA6). These serotypes are leading causes of outbreaks of severe respiratory illness and hand, foot, and mouth disease. Applying these approaches to three cross-sectional serosurveys, we estimated consistently higher and more persistent antibody concentrations throughout life for EV-D68 compared to EV-A71 and CVA6. Our analysis suggests that the proportion of recently infected individuals (i.e. individuals with high estimated antibody concentration levels given their age) peaks around 25% by age 7 years for both EV-A71 and CVA6 before gradually declining with age. In contrast, for EV-D68 the inferred proportion of the population in the infected state exceeds 50% by age 9 years and continues to grow with age. We also estimate that EV-D68 antibody concentration levels are higher than those of the other two serotypes, with the force of infection estimated to be highest in early childhood and declining more gradually with age than for EV-A71 and CVA6. These estimates are different to previous estimates found in the literature. Our inferential framework uncovers the wide-ranging variation in antibody levels that are often obscured by conventional endpoint titre estimation methods. We demonstrate that our framework can infer infection rates without relying on predetermined seropositivity cut-offs and without making explicit assumptions of virus-specific infection mechanisms. Author summarySerological tests measure antibody levels in blood to show how widely a virus has spread and how well populations are protected. Titre-based tests dilute blood samples in steps, mix these dilutions with virus, and add the mixture to living cells; the titre is the highest dilution where antibodies still protect cells from infection. Traditional analyses overlook test imperfections. We present a new two-part Bayesian framework to estimate antibody levels and track age-related exposure to infection. First, we estimate underlying antibody concentrations while accounting for uncertainty, then use these estimates in another model to infer age-specific transmission of three common viruses - EV-A71, EV-D68, and CVA6. Our results show that EV-D68 infections may be more common, especially in children, compared to the other viruses. This new approach provides a clearer picture of the dynamics of seroconversion, without relying on arbitrary thresholds, helping to improve public health monitoring and responses.

14

Detecting simulated pathogen releases in a real-world health data set

Moss, R.; Testolin, M. J.; Pitsaris, C.; Hill, A. M.; Muscatello, D. J.; McCaw, J. M.; Dawson, P.

2026-05-17 infectious diseases 10.64898/2026.05.12.26350999 medRxiv

Top 0.1%

7.3%

Show abstract

The purpose of electronic disease syndromic surveillance (EDSyS) systems is to detect hazardous pathogens and other unusual signals in health surveillance data before such events are identified by an individual clinician or healthcare facility. However, EDSyS systems have primarily been evaluated using simulated health surveillance data, which do not necessarily capture the richness and complexities of real-world health data. We have updated and extended an existing EDSyS system, EpiDefend, which combines ensemble forecasting and recursive Bayesian estimation in a particle filter framework that supports demographic and spatial structure. We simulated the release of several pathogens, both infectious and non-infectious, and injected the resulting cases into a real-world health data set. Here we evaluate EpiDefend's sensitivity and specificity in detecting these simulated releases, and measure the time to detection against pathogen-specific estimates of the time to clinical detection, as informed by clinicians and microbiologists. We show that for diseases where clinical diagnosis can be challenging, such as Q fever (Coxiella burnetii) and tularaemia (Francisella tularensis), EpiDefend can reliably beat the time to clinical detection. In contrast, for pathogens that can be clinically diagnosed relatively quickly, such as inhalational anthrax and pneumonic plague, it is extremely difficult to beat the time to clinical detection. Our results suggest that EpiDefend may be able to reliably detect real-world introductions or releases of some pathogens at low false-alarm rates before a clinical diagnosis would be confirmed, and this would represent a landmark achievement for EDSyS systems.

15

Simulating population compliance with pandemic interventions using large language models

Liu, R.; Jong, C.; Li, H.; Cao, Y.; Yao, Q.; Yamana, T.; Pei, S.; Du, H.

2026-05-15 infectious diseases 10.64898/2026.05.12.26352942 medRxiv

Top 0.1%

7.2%

Show abstract

Effective pandemic response requires accurate modeling of population compliance with non-pharmaceutical interventions (NPIs), yet most epidemic models treat behavioral change as fixed scenarios rather than an emergent process. Here, we test whether large language model (LLM)-based agents can generate individualized behavioral responses to time-varying NPIs and disease risk. We instantiate demographically representative agents in three U.S. cities (Boston, Denver, San Antonio) and condition them on evolving outbreak conditions and policies during the early COVID-19 pandemic, without fitting to observed mobility data. Across three frontier LLMs and their ensemble, agents generate zero-shot mobility changes across restaurants, retail, and entertainment venues, benchmarked against cellphone-derived foot-traffic records. The simulations recover average mobility trends across cities and venue types but exhibit overly narrow within-city variation. The three LLMs display distinct biases, while an ensemble approach improves robustness and overall performance. These findings establish LLM agents as a promising framework for modeling adherence to NPIs and highlight the need for further fine-tuning and empirical validation before they can support policy analysis.

16

Integrating vaccination with short-term behavioral guidance enables mpox outbreak control

Maniscalco, D.; Robineau, O.; Boelle, P.-Y.; Mailles, A.; Noel, H.; Tarantola, A.; Velter, A.; Colizza, V.

2026-05-28 infectious diseases 10.64898/2026.05.26.26354088 medRxiv

Top 0.2%

6.7%

Show abstract

Background. Despite the decline of the 2022 global outbreak, mpox remains an ongoing public health concern, with persistent transmission and emerging viral clades sustaining resurgence risk. Improving preparedness and response is a priority, yet it remains unclear how best pre-exposure vaccination and community response can effectively limit transmission under realistic conditions and whether behavioral adaptation is critical. Methods. We used a data-driven network model of mpox transmission among men who have sex with men in the Paris region, parameterized with sexual behavioral data and calibrated to surveillance data from the 2022 outbreak. We evaluated counterfactual scenarios by varying vaccination timing, rollout speed, prioritization, and behavioral responses. Results. Here we show that, with respect to the 2022 epidemic in the Paris region, vaccination alone delivered at the observed rollout speed would not have reproduced the observed epidemic decline, even if initiated the day of the first European alert, corresponding to 12 days before the first case was reported in France. Achieving comparable control through vaccination alone would have required more than a fourfold increase in rollout speed. Large-scale and long-term reductions in sexual contacts remain instrumental to limit the epidemic size, although earlier vaccination reduces the proportion of MSM needing to change behavior. In contrast, short-term behavioral measures adopted by the vaccinees, such as sexual abstinence during the 14-day immunity-building period, combined with moderately faster vaccine rollout, (+68% for 50% compliance; +34% for 75% compliance) could achieve comparable epidemic control. Targeting individuals with higher sexual activity further improved intervention efficiency. Conclusions. Under realistic reactive vaccination scenarios, mpox control still requires strong behavioral responses. Combining timely vaccination with short-term behavioral change guidance at vaccine administration offers a feasible path to limit transmission and strengthen outbreak preparedness and response.

17

Spatio-temporal machine learning for multi-horizon prediction of bluetongue outbreaks

Devlin, L. M.; Nguyen, P. H.; Cuthbert, R.; Doan, P. N.; Tran, V. H.; Zhang, Z.; Murchie, A. K.; Bamford, C. G. G.; Dick, J. T. A.; Morgan, E. R.; Mai, T. S.

2026-05-24 ecology 10.64898/2026.05.21.726753 medRxiv

Top 0.2%

6.5%

Show abstract

Reliable early warning of infectious disease outbreaks remains a major challenge for surveillance systems, particularly for vector-borne pathogens whose transmission depends on interactions among hosts, vectors, and climate-sensitive environmental conditions. Data-driven forecasting offers a promising approach for predicting outbreak risk using surveillance and environmental data. This study develops a logit-weighted ensemble (LWE), a machine-learning framework that predicts outbreak occurrence 1-6 months ahead at the administrative unit-month scale using routinely available outbreak notifications and gridded climate data. Bluetongue virus (BTV), an arbovirus of ruminants transmitted by Culicoides biting midges, provides a well-characterised system in which transmission is strongly shaped by climate, making it a useful system for applying and testing this approach. The framework is evaluated using surveillance data collected between 2005 and 2024 from France, Greece, and Italy, selected for their long-running and high-quality outbreak surveillance records. Across all three countries, the LWE achieved the strongest and most stable predictive performance under a recall-focused evaluation that prioritises correctly identifying outbreak months. It outperformed or matched 14 benchmark models, with differences becoming more pronounced at longer lead times (month +3 onward), when predictions are more uncertain and outbreaks are relatively rare. Predictability varied across countries, with the highest performance in Greece, strong performance in France, and lower, more variable performance in Italy, reflecting differences in how consistently outbreaks occur and spread across regions. Overall, the results demonstrate that horizon-aware, climate-informed forecasting can reliably identify months and locations at elevated risk of outbreak occurrence up to six months in advance, supporting surveillance planning and preparedness across heterogeneous European settings. The ensemble framework provides a robust and portable strategy for outbreak prediction using routinely collected surveillance and environmental data. Author SummaryPredicting infectious disease outbreaks before they occur remains a major challenge, particularly for diseases influenced by environmental conditions. In this study, we focus on bluetongue, a viral disease of livestock transmitted by biting midges, where transmission is strongly affected by climate and seasonal patterns. We develop a method that uses routinely collected outbreak reports and climate data to estimate where and when outbreaks are more likely to occur, up to six months in advance. We apply this approach across three European countries with a history of bluetongue outbreaks. We find that combining climate information with recent outbreak patterns can provide useful early signals of increased risk. Predictions are most accurate at shorter timeframes, but longer-range forecasts can still support planning and preparedness. Because our approach uses widely available data, it could be applied in other regions or to similar environmentally driven diseases. However, it does not include factors such as vaccination, animal movement, or detailed information on vector populations, which may also influence how outbreaks develop. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=98 SRC="FIGDIR/small/726753v1_ufig1.gif" ALT="Figure 1"> View larger version (30K): org.highwire.dtl.DTLVardef@45e41borg.highwire.dtl.DTLVardef@82c787org.highwire.dtl.DTLVardef@1f97888org.highwire.dtl.DTLVardef@1586747_HPS_FORMAT_FIGEXP M_FIG C_FIG

18

Local Influenza Forecasts Outperform State-Level Forecasts in the United States

Kim, D.; Pasco, R.; Johnson, K. E.; Fox, S. J.; Reich, N. G.; Meyers, L. A.

2026-06-08 infectious diseases 10.64898/2026.06.04.26354836 medRxiv

Top 0.2%

6.4%

Show abstract

Accurate outbreak forecasts are critical for timely and effective public health response. In the United States, however, most forecasts are produced at the state level, which can mask substantial sub-state heterogeneity and limit their utility for local planning. We generated and evaluated forecasts of the percentage of Emergency Department visits attributable to influenza across 173 large metropolitan Health Service Areas (HSAs) using a gradient boosting quantile regression (GBQR) model, and compared their accuracy to forecasts derived from state-level data alone. At a one-week, two-week and three-week horizon, local forecasts outperformed state-based forecasts in 98.8%, 90.8%, and 78.6% of HSAs, respectively, achieving mean weighted interval scores that were on average a 39.2% lower (95% range: 5.9% to 76.7%), 19.6% lower (-6.3% to 59.5%) , and 11.4% lower (-11.7% to 44.9%), respectively. The performance advantage of local forecasting was strongest in HSAs representing a smaller share of their state's population and increased with the proportion of the HSA population living in urban areas and the number of metropolitan areas within a state. These results, based on an analysis of HSAs with populations greater than 250,000, demonstrate that fine-scale modeling can substantially improve forecast accuracy and highlight the potential value of local forecasts for outbreak preparedness and response.

19

International risk of secondary hantavirus clusters following MV Hondius outbreak

Wang, B.; Lorenzetti, E.; Parino, F.; Colizza, V.; Valdano, E.

2026-05-22 public and global health 10.64898/2026.05.21.26353570 medRxiv

Top 0.2%

6.3%

Show abstract

The multinational Andes virus outbreak linked to the MV Hondius has exposed contacts across several countries, but the absence of further confirmed cases remains difficult to interpret given the long incubation period. We estimate the probability that secondary clusters may emerge using a stratified branching-process model parameterized with country-level tracing and isolation indicators. The risk of sustained spread is low, but secondary clusters remain plausible under imperfect isolation or pre-symptomatic transmission. These results support coordinated contact tracing and effective isolation while exposed contacts remain within the risk window.

20

Integrating patient movement and pathogen genomics to support hospital infection prevention with PathoPath: a method development study

Sajib, M. S.; Tanmoy, A. M.; Kanon, N.; Jui, A. B.; Islam, M. S.; Dola, N. Z.; Hossain, M. M.; Mobarak, R.; Shahidullah, M.; Hoque, M.; Ahmed, A. N. U.; Holmes, A. H.; Saha, S. K.; Saha, S.; Wan, Y.; Hooda, Y.

2026-06-05 infectious diseases 10.64898/2026.06.03.26354630 medRxiv

Top 0.3%

4.8%

Show abstract

Background Healthcare-associated infections pose a major burden to neonatal health worldwide and remain difficult to track in low-resource hospitals because patient movement data and pathogen genomic data are rarely integrated into actionable transmission models. Existing approaches are often restricted to specific settings, highly structured electronic health records (EHRs), or analyses focused on either patient movements or pathogen characteristics alone. To address this gap, we developed PathoPath, an open-source integrative modelling platform, and evaluated its utility in a high burden paediatric hospital in Dhaka, Bangladesh. Methods PathoPath is an open-source R package that combines electronic health records with whole genome sequencing data to generate contact networks from direct and indirect contacts using minimal structured inputs. We retrospectively applied PathoPath to 373 cases of Klebsiella pneumoniae species complex (KpSC) infection identified in 2021 at the largest paediatric referral hospital in Dhaka, Bangladesh. Ward level patient movement trajectories were used to reconstruct contact networks, and genomic data from isolates from children <60 days were integrated to identify probable dissemination of bacterial clones and antimicrobial resistance plasmids. Findings PathoPath identified 750 direct contacts among 317 patients, forming 25 connected components, with the largest including 93 patients. KpSC infections were identified across 21 of 37 wards, with the neonatal intensive care unit accounting for 77.9% of all cases. Integration of genomic and network data distinguished sustained clustering of ST147 from multiple probable inter-clonal dissemination events involving IncFII plasmids carrying blaNDM-5 and/or blaOXA-181 within ST16. Four dominant sequence types accounted for 65.6% of sequenced isolates, and carbapenemase genes were detected in 95.8%. Interpretation PathoPath reconstructs hospital-wide contact networks and integrates them with pathogen genomics to map probable dissemination of pathogens and antimicrobial resistance using minimal structured clinical data. It could support more targeted infection prevention and control in hospitals where granular digital records are not available.