Bayesian generative modeling for heterogeneous wastewater data applied to COVID-19 forecasting
Johnson, K. E.; Vega Yon, G.; Brand, S. P. C.; Bernal Zelaya, C.; Bayer, D.; Volkov, I.; Susswein, Z.; Magee, A.; Gostic, K. M.; English, K. M.; Ghinai, I.; Hamlet, A.; Olesen, S. W.; Pulliam, J.; Abbott, S.; Morris, D. H.
Show abstract
Infectious disease forecasts can inform public health decision-making. Wastewater monitoring is a relatively new epidemiological data source with multiple potential applications, including forecasting. Incorporating wastewater data into epidemiological forecasting models is challenging, and relatively few studies have assessed whether this improves forecast performance. We present and evaluate a semi-mechanistic wastewater-informed forecasting model. The model forecasts COVID-19 hospital admissions at the state and territorial levels in the United States, based on incident hospital admissions data and, optionally, SARS-CoV-2 wastewater concentration data from multiple wastewater sampling sites. From February through April 2024, we produced real-time wastewater-informed COVID-19 forecasts using development versions of the model and submitted them to the United States COVID-19 Forecast Hub ("the Hub"). We then published an open-source R package, wwinference, that implements the model with or without wastewater as an input. Using proper scoring rules and measures of model calibration, we assess both our real-time submissions to the Hub and retrospective hypothetical forecasts from wwinference made with and without wastewater data. While the models performed similarly with and without the wastewater signal included, there was substantial heterogeneity for individual locations and dates where wastewater data meaningfully improved or degraded the models forecast performance. Compared to other models submitted to the Hub during the period spanned by our submissions, the real-time wastewater-informed version of our model ranked fourth of 10 models, with the hospital admissions-only version of our model ranking second out of 10 models. Across the 2023-2024 winter epidemic wave, retrospective forecasts from wwinference would have performed similarly with and without the wastewater signal included: fifth and fourth out of 10 models, respectively. To better understand the drivers of differential forecast performance with and without wastewater, we performed an exploratory analysis investigating the relationship between characteristics of the input data and improved and reduced performance in our model. Based on that analysis, we identify and discuss key areas for further model development. To our knowledge, this is the first work that conducts an evaluation of real-time and retrospective infectious disease forecasts across the United States both with and without wastewater data and compared to other forecasting models. Author SummaryWastewater-based epidemiology, in combination with clinical surveillance, has the potential to improve situational awareness and inform outbreak responses. We developed a model that uses data on the pathogen concentration in wastewater from one or more wastewater treatment plants in combination with hospital admissions to produce short-term forecasts of hospital admissions. We produced and submitted forecasts of 28-day ahead COVID-19 hospital admissions from this model to the U.S. COVID-19 Forecast Hub during the spring of 2024 and found that it performed well in comparison to other models during that limited time period. To assess the added value of incorporating wastewater data into the model and to investigate how it would have performed had we submitted it during the entire 2023-2024 winter epidemic wave, we performed a retrospective analysis in which we produced forecasts from the model with and without including wastewater data, using data that would have been available in real-time as of each forecast date. Both versions of the model would have been median overall performers had they been submitted to the Hub throughout the season. When comparing the models performance with and without wastewater data included, we found that overall forecast performance was very similar, with wastewater data slightly reducing overall average forecast performance. Within this result, there was significant heterogeneity, with clear instances of wastewater data improving and detracting from forecast performance. We used trends in the observed data to generate hypotheses as to the drivers of improved and reduced relative forecast performance within our model. We conclude by suggesting future work to improve the model and more broadly the application of wastewater-based epidemiology to forecasting.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.