Back

Factors Influencing Vitamin D Status in Guiyang, China: A Random Forest and SHAP Analysis

pan, b.; Xian-ding, W.; Hong-lan, Y.

2026-05-18 health economics
10.64898/2026.05.13.26353105 medRxiv
Show abstract

Objective To assess serum 25-hydroxyvitamin D [25(OH)D] levels in a health examination population in Guiyang, a low-latitude, high-altitude, and cloudy city in southwestern China, and to identify key determinants using machine learning. Methods This retrospective study included 10,931 adults (>20 years) who underwent health checkups at Guiyang First People's Hospital between February 2019 and April 2025. Beyond conventional statistical comparisons, a two-stage machine learning approach was applied: LASSO regression for feature selection, followed by an optimized Random Forest regression model (mtry = 2). SHapley Additive exPlanations (SHAP) were used to quantify variable importance. Results The median serum 25(OH)D level was 36.63 (IQR 24.77,53.17) nmol/L. Vitamin D deficiency (<50 nmol/L) was present in 70.98% of participants, while sufficiency (>75 nmol/L) was only 7.35%. Significantly lower levels were observed in females, in adults aged <30 years (deficiency rate 85.6%), and during spring. The optimized Random Forest model achieved a cross-validated RMSE of 21.427. SHAP analysis revealed a clear hierarchy of importance: age (mean SHAP = 5.604) > season (4.104) > sex (1.533) {approx} BMI (1.501). Conclusion Vitamin D deficiency is highly prevalent in the Guiyang health examination population. Age and season are the dominant determinants, far outweighing sex and BMI. Targeted interventions should focus on young adults, females, and the spring season, especially in regions with similar cloudy highland climates.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.