Back

Spatio-temporal machine learning for multi-horizon prediction of bluetongue outbreaks

Devlin, L. M.; Nguyen, P. H.; Cuthbert, R.; Doan, P. N.; Tran, V. H.; Zhang, Z.; Murchie, A. K.; Bamford, C. G. G.; Dick, J. T. A.; Morgan, E. R.; Mai, T. S.

2026-05-24 ecology
10.64898/2026.05.21.726753 bioRxiv
Show abstract

Reliable early warning of infectious disease outbreaks remains a major challenge for surveillance systems, particularly for vector-borne pathogens whose transmission depends on interactions among hosts, vectors, and climate-sensitive environmental conditions. Data-driven forecasting offers a promising approach for predicting outbreak risk using surveillance and environmental data. This study develops a logit-weighted ensemble (LWE), a machine-learning framework that predicts outbreak occurrence 1-6 months ahead at the administrative unit-month scale using routinely available outbreak notifications and gridded climate data. Bluetongue virus (BTV), an arbovirus of ruminants transmitted by Culicoides biting midges, provides a well-characterised system in which transmission is strongly shaped by climate, making it a useful system for applying and testing this approach. The framework is evaluated using surveillance data collected between 2005 and 2024 from France, Greece, and Italy, selected for their long-running and high-quality outbreak surveillance records. Across all three countries, the LWE achieved the strongest and most stable predictive performance under a recall-focused evaluation that prioritises correctly identifying outbreak months. It outperformed or matched 14 benchmark models, with differences becoming more pronounced at longer lead times (month +3 onward), when predictions are more uncertain and outbreaks are relatively rare. Predictability varied across countries, with the highest performance in Greece, strong performance in France, and lower, more variable performance in Italy, reflecting differences in how consistently outbreaks occur and spread across regions. Overall, the results demonstrate that horizon-aware, climate-informed forecasting can reliably identify months and locations at elevated risk of outbreak occurrence up to six months in advance, supporting surveillance planning and preparedness across heterogeneous European settings. The ensemble framework provides a robust and portable strategy for outbreak prediction using routinely collected surveillance and environmental data. Author SummaryPredicting infectious disease outbreaks before they occur remains a major challenge, particularly for diseases influenced by environmental conditions. In this study, we focus on bluetongue, a viral disease of livestock transmitted by biting midges, where transmission is strongly affected by climate and seasonal patterns. We develop a method that uses routinely collected outbreak reports and climate data to estimate where and when outbreaks are more likely to occur, up to six months in advance. We apply this approach across three European countries with a history of bluetongue outbreaks. We find that combining climate information with recent outbreak patterns can provide useful early signals of increased risk. Predictions are most accurate at shorter timeframes, but longer-range forecasts can still support planning and preparedness. Because our approach uses widely available data, it could be applied in other regions or to similar environmentally driven diseases. However, it does not include factors such as vaccination, animal movement, or detailed information on vector populations, which may also influence how outbreaks develop. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=98 SRC="FIGDIR/small/726753v1_ufig1.gif" ALT="Figure 1"> View larger version (30K): org.highwire.dtl.DTLVardef@45e41borg.highwire.dtl.DTLVardef@82c787org.highwire.dtl.DTLVardef@1f97888org.highwire.dtl.DTLVardef@1586747_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.6%
22.8%
2
Scientific Reports
3102 papers in training set
Top 4%
12.5%
3
PLOS ONE
4510 papers in training set
Top 24%
6.9%
4
Epidemics
104 papers in training set
Top 0.2%
6.5%
5
Nature Communications
4913 papers in training set
Top 37%
4.0%
50% of probability mass above
6
Patterns
70 papers in training set
Top 0.2%
3.7%
7
Journal of The Royal Society Interface
189 papers in training set
Top 1%
3.6%
8
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.3%
9
Transboundary and Emerging Diseases
34 papers in training set
Top 0.2%
2.6%
10
Environmental Research Letters
15 papers in training set
Top 0.2%
2.1%
11
iScience
1063 papers in training set
Top 9%
2.1%
12
PLOS Neglected Tropical Diseases
378 papers in training set
Top 3%
2.1%
13
Movement Ecology
18 papers in training set
Top 0.2%
1.7%
14
Remote Sensing in Ecology and Conservation
10 papers in training set
Top 0.2%
1.5%
15
Viruses
318 papers in training set
Top 3%
1.3%
16
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
17
Preventive Veterinary Medicine
14 papers in training set
Top 0.2%
1.3%
18
Ecological Informatics
29 papers in training set
Top 0.5%
1.2%
19
Royal Society Open Science
193 papers in training set
Top 3%
1.2%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
1.0%
21
One Health
29 papers in training set
Top 1%
0.8%
22
eLife
5422 papers in training set
Top 57%
0.8%
23
Communications Biology
886 papers in training set
Top 28%
0.7%
24
BMC Infectious Diseases
118 papers in training set
Top 6%
0.7%
25
Frontiers in Public Health
140 papers in training set
Top 9%
0.5%
26
PeerJ
261 papers in training set
Top 19%
0.5%
27
Ecological Applications
28 papers in training set
Top 0.9%
0.5%