Back

Pediatric HIV Hotspots in Kenya: Machine Learning and Geostatistical Analysis for Enhanced Case Finding

ONOVO, A. A.; Omoro, G.; Maswai, J.; Owuoth, J.; Kirui, D.; Odero, L.; Makone, B.; Miruka, F.; Obat, E.; Yegon, P.

2026-04-27 public and global health
10.64898/2026.04.24.26351710 medRxiv
Show abstract

Background Although Kenya's HIV programme has long prioritized high-burden counties for intensified paediatric interventions, a critical evidence gap remains in developing integrated analytic frameworks that can objectively predict and validate paediatric HIV burden using data-driven models. We therefore developed and tested a framework that combines machine-learning (ML) prediction with geostatistical hotspot analysis, where a hotspot denotes a statistically significant spatial cluster of elevated paediatric HIV cases to strengthen data-driven surveillance and resource targeting. Methods National HIV testing data for children aged 0-14 years were analysed together with indicators from the 2022 Kenya Demographic and Health Survey. Multiple supervised ML algorithms were trained to predict the number of children living with HIV (CLHIV) across Kenya's 47 counties. Model performance was evaluated using root-mean-square and mean-absolute error. The tuned Lasso-regression model demonstrated the best predictive accuracy and generated county-level estimates for October 2022 to June 2023. These predictions were subsequently assessed for spatial autocorrelation (Moran's I) and validated using Getis-Ord Gi* statistics. Findings The model predicted 3160 newly identified CLHIV during the study period, compared with 3092 cases reported nationally. To account for differences in county population size, paediatric HIV incidence was calculated as cases per 10,000 children aged 0-14 years using 2023 census projections as the denominator. Incidence-based choropleth maps revealed that the highest reported burden was concentrated in Isiolo (11.2 per 10,000) and western Kenya (Homa Bay 7.7, Kisumu 3.6, Siaya 3.5), while model predictions identified additional high-incidence counties in eastern and northern regions. Significant spatial clustering was confirmed for both reported (z = 3.23, Moran's I = 0.22, p = 0.001) and predicted (z = 4.92, Moran's I = 0.37, p < 0.001) distributions. Thirteen counties, predominantly in western Kenya, were identified as statistically significant hotspots. Interpretation This study presents a validated methodological framework integrating ML prediction with geostatistical analysis for paediatric HIV surveillance. By expressing model outputs as population-adjusted incidence, the framework enables equitable comparison of paediatric HIV burden across counties of differing size, strengthening the evidence base for geographic prioritization and resource allocation. Funding This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS Global Public Health
293 papers in training set
Top 0.2%
22.9%
2
PLOS ONE
4510 papers in training set
Top 14%
12.9%
3
AIDS
31 papers in training set
Top 0.1%
8.6%
4
The Lancet Global Health
24 papers in training set
Top 0.2%
4.9%
5
BMC Infectious Diseases
118 papers in training set
Top 0.8%
4.0%
50% of probability mass above
6
BMJ Global Health
98 papers in training set
Top 0.8%
4.0%
7
PLOS Medicine
98 papers in training set
Top 1%
3.7%
8
American Journal of Epidemiology
57 papers in training set
Top 0.4%
2.6%
9
PLOS Digital Health
91 papers in training set
Top 1%
2.1%
10
Clinical Infectious Diseases
231 papers in training set
Top 2%
1.9%
11
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
1.9%
12
BMC Medicine
163 papers in training set
Top 3%
1.8%
13
BMJ Open
554 papers in training set
Top 9%
1.7%
14
Scientific Reports
3102 papers in training set
Top 63%
1.4%
15
The American Journal of Tropical Medicine and Hygiene
60 papers in training set
Top 3%
1.4%
16
The Journal of Infectious Diseases
182 papers in training set
Top 3%
1.2%
17
Journal of the International AIDS Society
20 papers in training set
Top 0.3%
1.2%
18
Malaria Journal
48 papers in training set
Top 1%
1.1%
19
Tropical Medicine & International Health
15 papers in training set
Top 0.5%
0.9%
20
Epidemics
104 papers in training set
Top 1%
0.9%
21
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
22
eLife
5422 papers in training set
Top 53%
0.9%
23
Transactions of The Royal Society of Tropical Medicine and Hygiene
16 papers in training set
Top 0.5%
0.8%
24
Nature Communications
4913 papers in training set
Top 62%
0.8%
25
BMC Public Health
147 papers in training set
Top 6%
0.8%
26
EClinicalMedicine
21 papers in training set
Top 0.9%
0.8%
27
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
28
PLOS Neglected Tropical Diseases
378 papers in training set
Top 6%
0.7%
29
Wellcome Open Research
57 papers in training set
Top 3%
0.5%
30
BMC Medical Research Methodology
43 papers in training set
Top 2%
0.5%