Machine Learning and Explainable AI for Multi-State Classification of Malaria Transmission Dynamics in Kenya
Gogo, J. A.; Wanyonyi, M.
Show abstract
Malaria remains a major public health challenge in sub-Saharan Africa, with pronounced spatial and temporal variation in transmission intensity that complicates effective control strategies. Accurate classification of transmission states is essential for guiding targeted interventions and strengthening early warning systems. This study develops a machine learning framework for the classification of malaria transmission states in Kenya using monthly panel data from 47 counties spanning the period 2015 to 2025. Transmission was categorised into four operationally relevant states based on incidence thresholds. Four supervised learning models, namely multinomial logistic regression, random forest, extreme gradient boosting, and support vector machine, were trained using temporally lagged features and evaluated under a forward chaining validation scheme to preserve temporal structure. Model performance was assessed using accuracy, macro averaged F1 score, Matthews correlation coefficient, and Brier score, complemented by calibration analysis. Extreme gradient boosting achieved the best overall performance, with accuracy of 0.9918, macro averaged F1 score of 0.9647, and Matthews correlation coefficient of 0.9831, alongside the lowest Brier score of 0.0031, indicating highly reliable probability estimates. Feature importance analysis revealed that lagged incidence, vegetation index, precipitation, and insecticide treated net coverage were the most influential predictors. Partial dependence analysis demonstrated nonlinear relationships and clear seasonal patterns in transmission dynamics. The findings show that machine learning approaches can accurately classify malaria transmission states while providing interpretable and well calibrated outputs for decision making. This framework offers a practical tool for supporting malaria surveillance and resource allocation. Further validation in different epidemiological settings is recommended to assess generalisability.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.