Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys
Liu, Y.; Herrin, J.; Huang, C.; Khera, R.; Dhingra, L. S.; Dong, W.; Mortazavi, B.; Krumholz, H. M.; Lu, Y.
Show abstract
BackgroundMaximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. MethodsWe used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. ResultsAmong the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. ConclusionOur non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. What is KnownO_LIAlthough cardiorespiratory fitness is recognized as an important marker of cardiovascular health, it is not routinely measured because of the time and resources required to perform exercise tests. C_LIO_LINon-exercise algorithms are cost-effective alternatives to estimate cardiorespiratory fitness, but the existing models are restricted in generalizability and predictive power. C_LI What the Study AddsO_LIWe improve non-exercise algorithms for cardiorespiratory fitness prediction using advanced ML methods and a more comprehensive and representative data source from U.S. national population surveys. C_LIO_LIMore health factors that are associated with cardiorespiratory fitness are newly identified. C_LIO_LINationally representative estimates for cardiorespiratory fitness in the U.S. over the recent 20 years are generated. C_LI
Matching journals
The top 6 journals account for 50% of the predicted probability mass.