Back

Digital Epidemiology for Epidemic Forecasting: Evaluating Twitter Emotion Signals and Google Symptom Searches Using Prophet and SARIMAX Models

Guigma, T. A.; Brooks, I.

2026-01-21 epidemiology
10.64898/2026.01.18.26344365 medRxiv
Show abstract

BackgroundDigital traces from social media and online search platforms have been used to support infectious disease forecasting, but their performance during the COVID-19 pandemic has varied widely. It is still uncertain which types of digital signals add dependable information to established forecasting models. ObjectiveThis study evaluated whether Twitter indicators and Google symptom search trends improve forecasts of COVID-19 cases in the United States. A second goal was to examine whether any gains remain consistent across two different forecasting approaches, Prophet and SARIMAX. MethodsNational daily COVID-19 case data were linked with Twitter-based emotion and Google Trends symptom variables. A forward-selection procedure identified the strongest predictors from each source. Four Prophet models were trained and tested through rolling 30-day forecasts. The same predictor sets were then used in parallel SARIMAX models. Performance was assessed using RMSE, MAE, and MAPE, and results were inspected across major epidemic waves. ResultsTwitter indicators produced the clearest and most consistent improvements. In the Prophet models, the Twitter-enhanced version reduced 30-day forecast error by about 14% compared with the baseline. Google symptom searches showed smaller and less stable improvements, and combining Google trends with Twitter signals did not produce additional benefits. SARIMAX models showed the same general pattern, although improvements were more modest. Across epidemic waves, Twitter-based models reacted more quickly to shifts in transmission than the baseline model. ConclusionsTwitter emotion indicators, especially neutral and informational posts, provided meaningful forecasting value across models and horizons. Google symptom searches contributed far less and did not strengthen performance when added to the Twitter predictors. The consistency of the findings across two modeling frameworks suggests that social media activity can offer reliable supplemental information for real-time epidemic forecasting. Continued work is needed to understand how these signals behave at finer spatial scales and in future outbreaks.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of Medical Internet Research
85 papers in training set
Top 0.2%
17.9%
2
BMC Infectious Diseases
118 papers in training set
Top 0.1%
17.9%
3
PLOS ONE
4510 papers in training set
Top 29%
6.1%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.1%
5
Epidemics
104 papers in training set
Top 0.2%
6.1%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 25%
4.7%
7
JMIR Public Health and Surveillance
45 papers in training set
Top 0.7%
3.5%
8
Infectious Disease Modelling
50 papers in training set
Top 0.5%
3.4%
9
Wellcome Open Research
57 papers in training set
Top 0.5%
2.4%
10
PeerJ
261 papers in training set
Top 5%
2.0%
11
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.6%
12
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.6%
13
npj Digital Medicine
97 papers in training set
Top 2%
1.6%
14
Frontiers in Public Health
140 papers in training set
Top 5%
1.4%
15
BMC Medicine
163 papers in training set
Top 5%
1.3%
16
BMC Public Health
147 papers in training set
Top 4%
1.2%
17
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
18
Patterns
70 papers in training set
Top 2%
0.9%
19
Influenza and Other Respiratory Viruses
44 papers in training set
Top 0.4%
0.9%
20
Disaster Medicine and Public Health Preparedness
16 papers in training set
Top 2%
0.7%
21
Epidemiology and Infection
84 papers in training set
Top 3%
0.7%
22
American Journal of Epidemiology
57 papers in training set
Top 2%
0.7%
23
mSystems
361 papers in training set
Top 8%
0.7%
24
Chaos, Solitons & Fractals
32 papers in training set
Top 2%
0.7%
25
Eurosurveillance
80 papers in training set
Top 2%
0.7%
26
Viruses
318 papers in training set
Top 6%
0.7%
27
eLife
5422 papers in training set
Top 62%
0.6%
28
PLOS Neglected Tropical Diseases
378 papers in training set
Top 6%
0.6%