Digital Epidemiology for Epidemic Forecasting: Evaluating Twitter Emotion Signals and Google Symptom Searches Using Prophet and SARIMAX Models
Guigma, T. A.; Brooks, I.
Show abstract
BackgroundDigital traces from social media and online search platforms have been used to support infectious disease forecasting, but their performance during the COVID-19 pandemic has varied widely. It is still uncertain which types of digital signals add dependable information to established forecasting models. ObjectiveThis study evaluated whether Twitter indicators and Google symptom search trends improve forecasts of COVID-19 cases in the United States. A second goal was to examine whether any gains remain consistent across two different forecasting approaches, Prophet and SARIMAX. MethodsNational daily COVID-19 case data were linked with Twitter-based emotion and Google Trends symptom variables. A forward-selection procedure identified the strongest predictors from each source. Four Prophet models were trained and tested through rolling 30-day forecasts. The same predictor sets were then used in parallel SARIMAX models. Performance was assessed using RMSE, MAE, and MAPE, and results were inspected across major epidemic waves. ResultsTwitter indicators produced the clearest and most consistent improvements. In the Prophet models, the Twitter-enhanced version reduced 30-day forecast error by about 14% compared with the baseline. Google symptom searches showed smaller and less stable improvements, and combining Google trends with Twitter signals did not produce additional benefits. SARIMAX models showed the same general pattern, although improvements were more modest. Across epidemic waves, Twitter-based models reacted more quickly to shifts in transmission than the baseline model. ConclusionsTwitter emotion indicators, especially neutral and informational posts, provided meaningful forecasting value across models and horizons. Google symptom searches contributed far less and did not strengthen performance when added to the Twitter predictors. The consistency of the findings across two modeling frameworks suggests that social media activity can offer reliable supplemental information for real-time epidemic forecasting. Continued work is needed to understand how these signals behave at finer spatial scales and in future outbreaks.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.