Pediatric HIV Hotspots in Kenya: Machine Learning and Geostatistical Analysis for Enhanced Case Finding
ONOVO, A. A.; Omoro, G.; Maswai, J.; Owuoth, J.; Kirui, D.; Odero, L.; Makone, B.; Miruka, F.; Obat, E.; Yegon, P.
Show abstract
Background Although Kenya's HIV programme has long prioritized high-burden counties for intensified paediatric interventions, a critical evidence gap remains in developing integrated analytic frameworks that can objectively predict and validate paediatric HIV burden using data-driven models. We therefore developed and tested a framework that combines machine-learning (ML) prediction with geostatistical hotspot analysis, where a hotspot denotes a statistically significant spatial cluster of elevated paediatric HIV cases to strengthen data-driven surveillance and resource targeting. Methods National HIV testing data for children aged 0-14 years were analysed together with indicators from the 2022 Kenya Demographic and Health Survey. Multiple supervised ML algorithms were trained to predict the number of children living with HIV (CLHIV) across Kenya's 47 counties. Model performance was evaluated using root-mean-square and mean-absolute error. The tuned Lasso-regression model demonstrated the best predictive accuracy and generated county-level estimates for October 2022 to June 2023. These predictions were subsequently assessed for spatial autocorrelation (Moran's I) and validated using Getis-Ord Gi* statistics. Findings The model predicted 3160 newly identified CLHIV during the study period, compared with 3092 cases reported nationally. To account for differences in county population size, paediatric HIV incidence was calculated as cases per 10,000 children aged 0-14 years using 2023 census projections as the denominator. Incidence-based choropleth maps revealed that the highest reported burden was concentrated in Isiolo (11.2 per 10,000) and western Kenya (Homa Bay 7.7, Kisumu 3.6, Siaya 3.5), while model predictions identified additional high-incidence counties in eastern and northern regions. Significant spatial clustering was confirmed for both reported (z = 3.23, Moran's I = 0.22, p = 0.001) and predicted (z = 4.92, Moran's I = 0.37, p < 0.001) distributions. Thirteen counties, predominantly in western Kenya, were identified as statistically significant hotspots. Interpretation This study presents a validated methodological framework integrating ML prediction with geostatistical analysis for paediatric HIV surveillance. By expressing model outputs as population-adjusted incidence, the framework enables equitable comparison of paediatric HIV burden across counties of differing size, strengthening the evidence base for geographic prioritization and resource allocation. Funding This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.