Back

Horizon-dependent forecast ranking under structural change: a rolling-origin benchmark for global COVID-19 incidence

Sesay, M. M.; Wembo, M. S.

2026-03-12 epidemiology
10.64898/2026.03.11.26348121 medRxiv
Show abstract

Short-horizon epidemic forecasting is difficult when surveillance series are highly nonstationary and affected by structural change and evolving reporting conditions. This study evaluates statistical models for global daily COVID-19 incidence using a rolling-origin benchmark designed to approximate real-time forecasting under such conditions. Using global incidence data from 22 January to 27 July 2020, we compare naive, seasonal naive, drift, ARIMA(log1p), ETS(log1p), and Prophet(log1p) forecasts at horizons h [isin] {1, 3, 7, 14} days. Structural phases are identified retrospectively on a variance-stabilized scale and used only to stratify forecast errors. Forecast ranking is strongly horizon-dependent. In the full-sample benchmark, drift performs best at the 1-, 7-, and 14-day horizons, while seasonal naive performs best at 3 days. Among the transformed statistical models, ARIMA(log1p) is competitive at short horizons, whereas ETS(log1p) becomes stronger at 7 and 14 days. Diebold-Mariano tests confirm that several of these differences are statistically meaningful, particularly in favor of drift at short and long horizons and in favor of ETS(log1p) over ARIMA(log1p) at longer horizons. Prophet(log1p) is not competitive in point forecasting and achieves high nominal interval coverage mainly through very wide prediction intervals. Robustness analyses show that the main ranking patterns are broadly stable to alternative segmentation settings, training-window policies, coverage-stabilized subsamples, and alternative target construction based on cumulative confirmed counts. Overall, the results show that simple baselines remain difficult to outperform in epidemic surveillance data and that horizon-specific rolling evaluation is essential for credible forecast comparison under structural change. Author summaryForecasting infectious disease incidence is difficult when case data change rapidly over time and when reporting systems are still evolving. In this study, I examined how several common statistical forecasting models perform on global daily COVID-19 incidence during the early pandemic. Rather than asking which model is best overall, I focused on whether model ranking changes across forecast horizons and whether those conclusions remain stable under different evaluation choices. I compared simple baselines, including naive, seasonal naive, and drift forecasts, with ARIMA, exponential smoothing, and Prophet models using a rolling-origin benchmark that mimics real-time forecasting. I found that forecast ranking depends strongly on the horizon: drift performed best at 1, 7, and 14 days, while seasonal naive performed best at 3 days. Among the transformed statistical models, ARIMA was more competitive at shorter horizons, whereas exponential smoothing was stronger at longer horizons. I also found that these conclusions remained broadly stable under alternative segmentation settings, training windows, coverage-stabilized subsamples, and target definitions. These results show that simple baselines can remain highly competitive in epidemic surveillance data and that horizon-specific evaluation is essential for fair forecast comparison under structural change.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
14.9%
2
PLOS ONE
4510 papers in training set
Top 15%
12.5%
3
Epidemics
104 papers in training set
Top 0.2%
6.4%
4
Journal of The Royal Society Interface
189 papers in training set
Top 0.7%
4.9%
5
Scientific Reports
3102 papers in training set
Top 23%
4.9%
6
Royal Society Open Science
193 papers in training set
Top 0.3%
4.4%
7
Infectious Disease Modelling
50 papers in training set
Top 0.4%
4.0%
50% of probability mass above
8
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.6%
9
Epidemiology
26 papers in training set
Top 0.2%
2.9%
10
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.8%
11
International Journal of Infectious Diseases
126 papers in training set
Top 1%
1.8%
12
Biology Methods and Protocols
53 papers in training set
Top 0.9%
1.7%
13
PeerJ
261 papers in training set
Top 7%
1.7%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
15
PLOS Genetics
756 papers in training set
Top 10%
1.5%
16
Wellcome Open Research
57 papers in training set
Top 1%
1.2%
17
Statistics in Medicine
34 papers in training set
Top 0.2%
1.2%
18
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
1.0%
19
American Journal of Epidemiology
57 papers in training set
Top 1%
0.9%
20
BMC Infectious Diseases
118 papers in training set
Top 4%
0.9%
21
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.9%
22
Journal of Theoretical Biology
144 papers in training set
Top 1%
0.9%
23
Bioinformatics
1061 papers in training set
Top 9%
0.8%
24
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.7%
25
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
12 papers in training set
Top 0.1%
0.7%
26
PNAS Nexus
147 papers in training set
Top 2%
0.7%
27
Nature Communications
4913 papers in training set
Top 65%
0.7%
28
International Journal of Epidemiology
74 papers in training set
Top 3%
0.5%
29
BMC Medicine
163 papers in training set
Top 9%
0.5%
30
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%