Back

Machine learning-based prediction of cardiovascular disease risk in Africa using WHO Stepwise Surveys: 2014-2019

Ng'ambi, W.; Merzouki, A.; Estill, J. G.; Keiser, O. G.; Orel, E.

2026-02-26 epidemiology
10.64898/2026.02.23.26346870 medRxiv
Show abstract

IntroductionCardiovascular diseases (CVDs) are the leading cause of death globally, with rising burdens in Africa due to ageing populations, lifestyle changes, and poor risk factor control. Conventional risk scores developed in high-income settings often perform poorly in African populations. Machine-learning (ML) approaches offer potential to improve prediction by capturing complex, non-linear interactions among demographic, behavioural, and biological factors. This study applies ML models to WHO STEPS survey data to generate context-specific CVD risk predictions across 12 African countries. MethodsWe analysed data from 60,294 adults collected in WHO STEPS surveys between 2014 and 2019 across 12 African countries. Three ML models; Elastic Net logistic regression (LASSO), Random Forest (RF), and XGBoost (XGB); were trained to predict self-reported CVD outcomes. Data were split into training (80%) and testing (20%) sets with five-fold cross-validation. Feature selection used the Boruta algorithm, and model performance was assessed via accuracy, sensitivity, specificity, AUC, F1 score, and Brier score. ResultsOverall CVD prevalence was 5%. Hypertension emerged as the strongest predictor across all models, followed by alcohol-related harm. Tree-based models outperformed regression approaches and conventional clinical scores, with XGBoost achieving the highest discrimination (AUC=0.769), balanced accuracy (0.699), and calibration (Brier score=0.195). Predicted risk trajectories were smoother and more clinically plausible than Framingham or WHO/ISH scores, particularly across age, sex, and hypertension status. LASSO and Random Forest performed moderately, while conventional risk scores showed poor discrimination and marked miscalibration. ConclusionMachine-learning approaches provide accurate, context-specific cardiovascular risk prediction in African populations. By highlighting modifiable risk factors such as hypertension and alcohol-related harm, these models support targeted interventions aligned with WHO PEN, HEARTS, and SBIRT strategies. The African CVD Risk Prediction Tool translates complex data into actionable insights, offering a scalable platform for prevention-focused, equitable cardiovascular care across diverse African settings.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
International Journal of Epidemiology
74 papers in training set
Top 0.1%
26.5%
2
Journal of the American Heart Association
119 papers in training set
Top 1.0%
7.0%
3
PLOS ONE
4510 papers in training set
Top 27%
6.5%
4
BMC Medicine
163 papers in training set
Top 0.9%
4.4%
5
Scientific Reports
3102 papers in training set
Top 30%
4.1%
6
PLOS Medicine
98 papers in training set
Top 1%
3.7%
50% of probability mass above
7
BMC Infectious Diseases
118 papers in training set
Top 1%
3.1%
8
Wellcome Open Research
57 papers in training set
Top 0.4%
2.7%
9
European Journal of Epidemiology
40 papers in training set
Top 0.2%
2.1%
10
eBioMedicine
130 papers in training set
Top 0.7%
2.1%
11
American Journal of Epidemiology
57 papers in training set
Top 0.7%
1.7%
12
BMC Public Health
147 papers in training set
Top 4%
1.5%
13
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.3%
14
BMJ Global Health
98 papers in training set
Top 2%
1.3%
15
Nature Communications
4913 papers in training set
Top 57%
1.1%
16
The Lancet Global Health
24 papers in training set
Top 0.9%
1.0%
17
Epidemiology
26 papers in training set
Top 0.4%
1.0%
18
BMJ Open
554 papers in training set
Top 11%
1.0%
19
PLOS Global Public Health
293 papers in training set
Top 5%
0.8%
20
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
21
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 1%
0.8%
22
BMJ Open Diabetes Research & Care
15 papers in training set
Top 0.9%
0.8%
23
Journal of Epidemiology and Community Health
32 papers in training set
Top 0.6%
0.8%
24
PLOS Digital Health
91 papers in training set
Top 2%
0.8%
25
Nature Medicine
117 papers in training set
Top 5%
0.8%
26
Diabetologia
36 papers in training set
Top 1.0%
0.7%
27
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
28
American Journal of Preventive Medicine
11 papers in training set
Top 0.6%
0.7%
29
Circulation
66 papers in training set
Top 2%
0.7%
30
JMIR mHealth and uHealth
10 papers in training set
Top 0.4%
0.7%