Integrating Machine Learning-Based Variable Selection into Heat Vulnerability Index Design
Qu, S.; Sillmann, J.; Barrett, B. W.; Graffy, P. M.; Poschlod, B.; Brunner, L.; Mansour, R.; Szombathely, M. v.; Hay-Chapman, F.; Horton, T. H.; Chan, J.; Rao, S. K.; Woods, K.; Kho, A. N.; Horton, D. E.
Show abstract
As climate change intensifies, health risks from extreme heat are rising. Accurate assessment of heat vulnerability at high spatial resolution is crucial for developing effective adaptation strategies, particularly in socioeconomically heterogeneous urban settings. However, the identification of key indicators underlying heat vulnerability remains challenging. Using Chicago, Illinois (USA) as a case study, we systematically compare different variable selection strategies in community-level heat vulnerability assessments. We take the conventional unsupervised principal component analysis (PCA)-based Heat Vulnerability Index (HVI) as a baseline, and compare it with supervised approaches that incorporate variable selection, including machine learning algorithms (Lasso regression, Random Forest, and XGBoost) as well as traditional statistical methods (simple linear regression and polynomial regression). Using the vulnerability indicator subsets identified by each variable selection method, we construct multiple HVIs and evaluate their performance against heat-related excess mortality. Our work indicates that supervised variable selection improves the performance of HVIs in capturing heat-related health risks. Among all methods, the Random Forest-based variable selection algorithm achieves the best overall results, highlighting the potential of machine learning to enhance heat vulnerability assessment tools. Our results demonstrate that poverty rate, lack of air conditioning, and proportion of residents aged 65 and above are robust determinants of heat vulnerability in Chicago.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.