Back

Fine-grained spatial data-driven ensemble modeling for predicting Sylvatic Yellow Fever environmental suitability in Brazil

Augusto, D. A.; Abdalla, L.; Krempser, E.; de Oliveira Passos, P. H.; Garkauskas Ramos, D.; Pecego Martins Romano, A.; Chame, M.

2026-04-01 epidemiology
10.64898/2026.03.26.26349443 medRxiv
Show abstract

Sylvatic Yellow Fever (YF) is an infectious mosquito-borne disease with significant epidemiological relevance due to its widespread distribution and high lethality for human and non-human primates, particularly in tropical regions of the planet such as in Brazil. Identifying regions and periods of high environmental suitability for the occurrence of YF is essential for preventing or mitigating its burden, as it enables the efficient allocation of surveillance efforts, prevention, and implementation of control measures. Environmental modeling of YF occurrence has proven to be an effective approach toward this goal; however, its effectiveness strongly depends on the modeling framework's capabilities as well as the spatial and temporal precision of all associated data. We propose a fine-scale geospatial modeling of YF environmental suitability that is based on a generative machine-learning ensemble method built on a large set of high-resolution environmental covariates. First, we take the spatiotemporal statistical description of the environment of each of the 545 YF cases from 2019--2024 up to 30 m/monthly resolution at three buffer scales: 100 m, 500 m, and 1000 m ratios. Then, we perform a feature selection and train hundreds of One-Class Support Vector Machine submodels to form a robust ensemble model, whose predictions are projected to a 1x1 km resolution grid of Brazil under several metrics, exceeding seven million ensemble evaluations. The predictions ranked the Southern Brazil region with the highest mean suitability for YF, with a level of 0.64; Southeast comes next with 0.46, followed closely by Central-West region (0.44), North (0.39), and finally Northeast (0.28). The model exhibited high uncertainty for the North region, indicating that data collection efforts are much needed in this region. As for the environmental covariates, a feature analysis pointed out that Land use and cover accounts for the largest influence in the model output.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 17%
10.5%
2
Scientific Reports
3102 papers in training set
Top 6%
10.2%
3
PLOS Computational Biology
1633 papers in training set
Top 6%
6.4%
4
Chaos, Solitons & Fractals
32 papers in training set
Top 0.4%
4.3%
5
Infectious Disease Modelling
50 papers in training set
Top 0.3%
4.2%
6
GeoHealth
10 papers in training set
Top 0.1%
4.0%
7
Epidemics
104 papers in training set
Top 0.6%
2.9%
8
The American Journal of Tropical Medicine and Hygiene
60 papers in training set
Top 2%
2.4%
9
Frontiers in Public Health
140 papers in training set
Top 4%
2.1%
10
PLOS Neglected Tropical Diseases
378 papers in training set
Top 3%
1.9%
11
Spatial and Spatio-temporal Epidemiology
10 papers in training set
Top 0.1%
1.7%
50% of probability mass above
12
Frontiers in Physics
20 papers in training set
Top 0.3%
1.7%
13
Malaria Journal
48 papers in training set
Top 0.9%
1.7%
14
Science of The Total Environment
179 papers in training set
Top 3%
1.7%
15
Heliyon
146 papers in training set
Top 2%
1.5%
16
Nature Communications
4913 papers in training set
Top 53%
1.5%
17
Viruses
318 papers in training set
Top 3%
1.5%
18
Infectious Diseases of Poverty
10 papers in training set
Top 0.1%
1.3%
19
Patterns
70 papers in training set
Top 1%
1.3%
20
Landscape Ecology
12 papers in training set
Top 0.2%
1.3%
21
PLOS Global Public Health
293 papers in training set
Top 4%
1.2%
22
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
1.1%
23
Parasites & Vectors
57 papers in training set
Top 0.9%
1.1%
24
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
1.0%
25
Frontiers in Applied Mathematics and Statistics
10 papers in training set
Top 0.3%
0.9%
26
npj Digital Medicine
97 papers in training set
Top 3%
0.8%
27
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.8%
28
Scientific Data
174 papers in training set
Top 2%
0.8%
29
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
30
Quantitative Biology
11 papers in training set
Top 0.7%
0.8%