Back

Machine Learning and Explainable AI for Multi-State Classification of Malaria Transmission Dynamics in Kenya

Gogo, J. A.; Wanyonyi, M.

2026-05-12 health informatics
10.64898/2026.05.09.26352789 medRxiv
Show abstract

Malaria remains a major public health challenge in sub-Saharan Africa, with pronounced spatial and temporal variation in transmission intensity that complicates effective control strategies. Accurate classification of transmission states is essential for guiding targeted interventions and strengthening early warning systems. This study develops a machine learning framework for the classification of malaria transmission states in Kenya using monthly panel data from 47 counties spanning the period 2015 to 2025. Transmission was categorised into four operationally relevant states based on incidence thresholds. Four supervised learning models, namely multinomial logistic regression, random forest, extreme gradient boosting, and support vector machine, were trained using temporally lagged features and evaluated under a forward chaining validation scheme to preserve temporal structure. Model performance was assessed using accuracy, macro averaged F1 score, Matthews correlation coefficient, and Brier score, complemented by calibration analysis. Extreme gradient boosting achieved the best overall performance, with accuracy of 0.9918, macro averaged F1 score of 0.9647, and Matthews correlation coefficient of 0.9831, alongside the lowest Brier score of 0.0031, indicating highly reliable probability estimates. Feature importance analysis revealed that lagged incidence, vegetation index, precipitation, and insecticide treated net coverage were the most influential predictors. Partial dependence analysis demonstrated nonlinear relationships and clear seasonal patterns in transmission dynamics. The findings show that machine learning approaches can accurately classify malaria transmission states while providing interpretable and well calibrated outputs for decision making. This framework offers a practical tool for supporting malaria surveillance and resource allocation. Further validation in different epidemiological settings is recommended to assess generalisability.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.4%
22.3%
2
Malaria Journal
48 papers in training set
Top 0.2%
14.6%
3
PLOS ONE
4510 papers in training set
Top 16%
12.2%
4
Infectious Disease Modelling
50 papers in training set
Top 0.2%
6.3%
50% of probability mass above
5
PLOS Digital Health
91 papers in training set
Top 0.5%
4.8%
6
BMC Infectious Diseases
118 papers in training set
Top 1%
3.0%
7
BMC Medicine
163 papers in training set
Top 3%
1.9%
8
PLOS Computational Biology
1633 papers in training set
Top 15%
1.9%
9
The American Journal of Tropical Medicine and Hygiene
60 papers in training set
Top 2%
1.7%
10
PLOS Global Public Health
293 papers in training set
Top 4%
1.7%
11
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.7%
12
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
13
Parasites & Vectors
57 papers in training set
Top 0.7%
1.7%
14
Frontiers in Microbiology
375 papers in training set
Top 6%
1.5%
15
Epidemics
104 papers in training set
Top 1%
1.2%
16
Heliyon
146 papers in training set
Top 4%
1.1%
17
PLOS Neglected Tropical Diseases
378 papers in training set
Top 4%
0.9%
18
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.7%
19
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
20
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
21
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
22
Chaos, Solitons & Fractals
32 papers in training set
Top 2%
0.6%