Back

Bayesian generative modeling for heterogeneous wastewater data applied to COVID-19 forecasting

Johnson, K. E.; Vega Yon, G.; Brand, S. P. C.; Bernal Zelaya, C.; Bayer, D.; Volkov, I.; Susswein, Z.; Magee, A.; Gostic, K. M.; English, K. M.; Ghinai, I.; Hamlet, A.; Olesen, S. W.; Pulliam, J.; Abbott, S.; Morris, D. H.

2026-02-24 infectious diseases
10.64898/2026.02.23.26346887 medRxiv
Show abstract

Infectious disease forecasts can inform public health decision-making. Wastewater monitoring is a relatively new epidemiological data source with multiple potential applications, including forecasting. Incorporating wastewater data into epidemiological forecasting models is challenging, and relatively few studies have assessed whether this improves forecast performance. We present and evaluate a semi-mechanistic wastewater-informed forecasting model. The model forecasts COVID-19 hospital admissions at the state and territorial levels in the United States, based on incident hospital admissions data and, optionally, SARS-CoV-2 wastewater concentration data from multiple wastewater sampling sites. From February through April 2024, we produced real-time wastewater-informed COVID-19 forecasts using development versions of the model and submitted them to the United States COVID-19 Forecast Hub ("the Hub"). We then published an open-source R package, wwinference, that implements the model with or without wastewater as an input. Using proper scoring rules and measures of model calibration, we assess both our real-time submissions to the Hub and retrospective hypothetical forecasts from wwinference made with and without wastewater data. While the models performed similarly with and without the wastewater signal included, there was substantial heterogeneity for individual locations and dates where wastewater data meaningfully improved or degraded the models forecast performance. Compared to other models submitted to the Hub during the period spanned by our submissions, the real-time wastewater-informed version of our model ranked fourth of 10 models, with the hospital admissions-only version of our model ranking second out of 10 models. Across the 2023-2024 winter epidemic wave, retrospective forecasts from wwinference would have performed similarly with and without the wastewater signal included: fifth and fourth out of 10 models, respectively. To better understand the drivers of differential forecast performance with and without wastewater, we performed an exploratory analysis investigating the relationship between characteristics of the input data and improved and reduced performance in our model. Based on that analysis, we identify and discuss key areas for further model development. To our knowledge, this is the first work that conducts an evaluation of real-time and retrospective infectious disease forecasts across the United States both with and without wastewater data and compared to other forecasting models. Author SummaryWastewater-based epidemiology, in combination with clinical surveillance, has the potential to improve situational awareness and inform outbreak responses. We developed a model that uses data on the pathogen concentration in wastewater from one or more wastewater treatment plants in combination with hospital admissions to produce short-term forecasts of hospital admissions. We produced and submitted forecasts of 28-day ahead COVID-19 hospital admissions from this model to the U.S. COVID-19 Forecast Hub during the spring of 2024 and found that it performed well in comparison to other models during that limited time period. To assess the added value of incorporating wastewater data into the model and to investigate how it would have performed had we submitted it during the entire 2023-2024 winter epidemic wave, we performed a retrospective analysis in which we produced forecasts from the model with and without including wastewater data, using data that would have been available in real-time as of each forecast date. Both versions of the model would have been median overall performers had they been submitted to the Hub throughout the season. When comparing the models performance with and without wastewater data included, we found that overall forecast performance was very similar, with wastewater data slightly reducing overall average forecast performance. Within this result, there was significant heterogeneity, with clear instances of wastewater data improving and detracting from forecast performance. We used trends in the observed data to generate hypotheses as to the drivers of improved and reduced relative forecast performance within our model. We conclude by suggesting future work to improve the model and more broadly the application of wastewater-based epidemiology to forecasting.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Epidemics
104 papers in training set
Top 0.1%
9.9%
2
Science of The Total Environment
179 papers in training set
Top 1.0%
8.2%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
7.0%
4
PLOS ONE
4510 papers in training set
Top 29%
6.2%
5
Environment International
42 papers in training set
Top 0.3%
6.2%
6
ACS ES&T Water
18 papers in training set
Top 0.1%
6.2%
7
GeoHealth
10 papers in training set
Top 0.1%
4.7%
8
PLOS Global Public Health
293 papers in training set
Top 2%
3.6%
50% of probability mass above
9
mSystems
361 papers in training set
Top 3%
3.5%
10
Scientific Reports
3102 papers in training set
Top 39%
3.5%
11
FACETS
11 papers in training set
Top 0.1%
2.3%
12
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
2.3%
13
Environmental Health Perspectives
17 papers in training set
Top 0.2%
2.0%
14
PeerJ
261 papers in training set
Top 6%
1.8%
15
Infectious Disease Modelling
50 papers in training set
Top 0.8%
1.7%
16
BMC Infectious Diseases
118 papers in training set
Top 3%
1.4%
17
American Journal of Infection Control
12 papers in training set
Top 0.2%
1.3%
18
Environmental Research
46 papers in training set
Top 1%
1.3%
19
Water Research
74 papers in training set
Top 1%
1.3%
20
One Health
29 papers in training set
Top 0.8%
1.3%
21
International Journal of Environmental Research and Public Health
124 papers in training set
Top 6%
0.9%
22
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.9%
23
Frontiers in Public Health
140 papers in training set
Top 7%
0.9%
24
JAMIA Open
37 papers in training set
Top 1%
0.9%
25
Spatial and Spatio-temporal Epidemiology
10 papers in training set
Top 0.2%
0.8%
26
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
27
Frontiers in Microbiology
375 papers in training set
Top 9%
0.7%
28
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
29
Environmental Science & Technology
64 papers in training set
Top 2%
0.7%
30
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%