Back

Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys

Liu, Y.; Herrin, J.; Huang, C.; Khera, R.; Dhingra, L. S.; Dong, W.; Mortazavi, B.; Krumholz, H. M.; Lu, Y.

2022-10-04 health informatics
10.1101/2022.09.30.22280471 medRxiv
Show abstract

BackgroundMaximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. MethodsWe used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. ResultsAmong the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. ConclusionOur non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. What is KnownO_LIAlthough cardiorespiratory fitness is recognized as an important marker of cardiovascular health, it is not routinely measured because of the time and resources required to perform exercise tests. C_LIO_LINon-exercise algorithms are cost-effective alternatives to estimate cardiorespiratory fitness, but the existing models are restricted in generalizability and predictive power. C_LI What the Study AddsO_LIWe improve non-exercise algorithms for cardiorespiratory fitness prediction using advanced ML methods and a more comprehensive and representative data source from U.S. national population surveys. C_LIO_LIMore health factors that are associated with cardiorespiratory fitness are newly identified. C_LIO_LINationally representative estimates for cardiorespiratory fitness in the U.S. over the recent 20 years are generated. C_LI

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Medicine & Science in Sports & Exercise
15 papers in training set
Top 0.1%
28.5%
2
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
6.6%
3
Scientific Reports
3102 papers in training set
Top 22%
5.0%
4
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 0.8%
4.4%
5
PLOS ONE
4510 papers in training set
Top 35%
4.1%
6
JMIR Public Health and Surveillance
45 papers in training set
Top 0.5%
3.7%
50% of probability mass above
7
Journal of the American Heart Association
119 papers in training set
Top 2%
3.7%
8
JAMIA Open
37 papers in training set
Top 0.5%
2.7%
9
PLOS Digital Health
91 papers in training set
Top 1%
2.1%
10
BMC Medicine
163 papers in training set
Top 3%
1.7%
11
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
12
BMJ Open
554 papers in training set
Top 9%
1.7%
13
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.7%
14
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
15
npj Digital Medicine
97 papers in training set
Top 2%
1.4%
16
European Journal of Epidemiology
40 papers in training set
Top 0.4%
1.3%
17
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
18
BMC Public Health
147 papers in training set
Top 4%
1.1%
19
JMIR mHealth and uHealth
10 papers in training set
Top 0.3%
1.0%
20
Frontiers in Physiology
93 papers in training set
Top 5%
0.9%
21
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
22
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
23
eLife
5422 papers in training set
Top 55%
0.8%
24
International Journal of Behavioral Nutrition and Physical Activity
15 papers in training set
Top 0.4%
0.8%
25
PeerJ
261 papers in training set
Top 13%
0.8%
26
American Journal of Preventive Medicine
11 papers in training set
Top 0.5%
0.8%
27
eClinicalMedicine
55 papers in training set
Top 2%
0.8%
28
European Journal of Preventive Cardiology
13 papers in training set
Top 0.9%
0.8%
29
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 1%
0.8%
30
Preventive Medicine Reports
14 papers in training set
Top 0.4%
0.8%