Back

Interpretable Lifestyle-Based Machine Learning Models for Ten-Year Cardiovascular Risk Prediction using data from the UK Biobank

Feng, Y.; Kunz, H.; Dziopa, K.

2026-02-01 health informatics
10.64898/2026.01.26.26344438 medRxiv
Show abstract

BackgroundCardiovascular diseases (CVDs) remain the leading global cause of morbidity and mortality. In clinical practice, 10-year risk prediction tools such as the Pooled Cohort Equations, QRISK3, and SCORE2 are widely used because of their transparency and clinical trustworthiness, but they rely heavily on biomarkers and medical history. Hence, most recommendations concentrate on pharmaceutical or procedural management, and in many situations, crucial biomarker indicators are unavailable, making it difficult to precisely evaluate individual risk and select appropriate treatments. ObjectiveTo develop interpretable, lifestyle-based machine learning models for predicting 10-year risk of cardiovascular disease (including heart failure and atrial fibrillation), and more critically, to systematically compare interpretability algorithms and assess the cross-model consistency of the identified behavioural factors MethodsUsing UK Biobank data, logistic regression, random forest, and XGBoost models were trained on lifestyle (including sleep, smoking, diet, physical activity and electronic device use) and demographic variables only. Discrimination, calibration and interpretability were evaluated using permutation importance, SHapley Additive Explanations and Local Interpretable Model-agnostic Explanations), with subgroup analyses by sex and age to characterise heterogeneity in model behaviour and feature relevance. ResultsThe developed models demonstrated good discrimination, with XGBoost performing best (ROC-AUC 0.726 [95% CI 0.720-0.731]; PR-AUC 0.199), closely followed by logistic regression (ROC-AUC 0.721 [95% CI 0.716-0.726]; PR-AUC 0.192), while random forest showed slightly lower performance. Despite this similar performance, interpretability analyses revealed inconsistencies in models importance ranking of lifestyle factors. Age, sex, and smoking behaviours consistently emerged as key contributors across all interpretability methods, demonstrating strong cross-model agreement, while other lifestyle factors such as dietary patterns, physical activity, and sleep showed model-dependent variation in their assigned importance. Subgroup analyses further indicated that modifiable behaviours (smoking, diet, sleep) were particularly influential among younger females, whereas cumulative exposures and family history were more dominant drivers in older males. ConclusionsLifestyle-only interpretable models offer a scalable and low-cost framework for cardiovascular risk assessment and behaviour-focused prevention, without requiring laboratory measurements or clinical testing. By comparing multiple interpretability algorithms across models, this study shows strong cross-method consistency and highlights lifestyle factors whose importance profiles differ from those in traditional biomarker-based calculators. These models can complement existing risk tools by highlighting modifiable behaviours, which is particularly valuable for younger adults. They can also support personalised feedback in digital-health settings to promote behavioural change. Overall, the findings support the development of transparent, behaviour-focused tools that enable accessible and equitable cardiovascular prevention.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.2%
21.5%
2
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
6.0%
3
PLOS ONE
4510 papers in training set
Top 33%
4.6%
4
PLOS Digital Health
91 papers in training set
Top 0.6%
4.1%
5
Journal of the American Heart Association
119 papers in training set
Top 2%
3.8%
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.8%
7
Scientific Reports
3102 papers in training set
Top 39%
3.5%
8
BMC Medicine
163 papers in training set
Top 2%
3.4%
50% of probability mass above
9
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.4%
10
The Lancet Digital Health
25 papers in training set
Top 0.2%
3.4%
11
JMIR Public Health and Surveillance
45 papers in training set
Top 0.8%
2.9%
12
European Heart Journal - Digital Health
15 papers in training set
Top 0.2%
2.6%
13
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 1%
2.5%
14
JMIR mHealth and uHealth
10 papers in training set
Top 0.2%
2.0%
15
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.8%
16
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.8%
17
BMJ Health & Care Informatics
13 papers in training set
Top 0.5%
1.6%
18
European Journal of Epidemiology
40 papers in training set
Top 0.4%
1.6%
19
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.6%
20
JAMIA Open
37 papers in training set
Top 0.9%
1.6%
21
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 0.8%
1.4%
22
BMJ Open
554 papers in training set
Top 10%
1.4%
23
Medicine & Science in Sports & Exercise
15 papers in training set
Top 0.3%
1.2%
24
BMC Infectious Diseases
118 papers in training set
Top 4%
0.9%
25
BMC Cardiovascular Disorders
14 papers in training set
Top 1%
0.9%
26
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
27
European Respiratory Journal
54 papers in training set
Top 2%
0.8%
28
eBioMedicine
130 papers in training set
Top 5%
0.7%
29
Nature Communications
4913 papers in training set
Top 65%
0.7%
30
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%