Climate-Informed Deep Learning for Spatio-Temporal Forecasting of Climate-Sensitive Diseases
Tegenaw, G. S.; Degu, M. Z.; Gebeyehu, W. B.; Senay, A. B.; Krishnamoorthy, J.; Ward, T.; Simegn, G. L.
Show abstract
Effective public health planning and intervention strategies necessitate an understanding of the temporal and geographic distribution of disease incidences. This requires robust frameworks for disease incidence forecasting. However, due to variations in cases and temporal dynamics, grasping the distinct patterns of climate-sensitive diseases poses significant challenges, including identifying hotspots, trends, and seasonal variations in disease incidence. Furthermore, although most studies focus on directly predicting future incidence using historical patterns and covariates, a significant gap remains between methodological proliferation marked by diverse architectures, where models are trained and validated on benchmark datasets that are standardized and statistically stable, and epidemiological reality, which is often characterized by irregular, sparse, and highly skewed data, as well as rare but high-magnitude or bimodally distributed incidences. Hence, traditional end-to-end approaches that directly map climate and disease data often fail in these data-scarce settings due to overfitting and poor generalization. To understand disease epidemiology and mitigate the impact of incidence, we analyzed a decade of retrospective datasets in Ethiopia to examine how climate and weather conditions influence the incidence or spread of climate-sensitive diseases, including malaria and dysentery. In this study, we proposed a two-stage hybrid framework, a climate-informed disease prediction model, to forecast the likelihood of disease incidences using decades of climate and weather data. First, deep learning was applied to capture latent weather dynamics. Then, a hurdle model using Extreme Gradient Boosting (XGB) was designed for zero-inflated incidence data, combining XGBClassifier to predict incidence and XGBRegressor to estimate its size, based on weather dynamics to forecast disease incidence. Our proposed multivariate climate-driven disease incidence model incorporates both spatial (elevation, coordinates) and temporal (year, month) factors, along with key weather parameters (precipitation, sunlight, wind, relative humidity, temperature) to predict the likelihood of multiple diseases occurring in each area, serving as a foundation for future disease incidence predictions in the region. Out of 72 evaluated experiments across four categories and six targets, we found that the Transformer model showed highest number of statistically significant wins (n=18, 25.0%) comparison with Long Short-Term Memory (LSTM) (n=9, 12.5%) and the Temporal Convolutional Neural Network (TCN) (n=5, 6.9%) at climate variable forecasting using Pairwise Model Comparison Diebold-Mariano Test. The hurdle model that combines XGBClassifier and XGBRegressor outperformed the baseline in both Malaria and Dysentery forecasting. Error stratification revealed that the hurdle model provided the greatest benefit during incidence periods, as indicated by a substantially lower Mean Average Error (MAE) in both incidence and non-incidence periods than the baseline. Our proposed modular pipeline first forecasts climate variables, then predicts disease incidence, thereby enhancing interpretability and generalization in data-sparse settings. Overall, this approach provides a scalable, climate-aware forecasting tool for public health planning, particularly in regions where these diseases are endemic or where climate change may affect their prevalence, as well as in data-scarce settings.
Matching journals
The top 10 journals account for 50% of the predicted probability mass.