Back

Integrating Machine Learning-Based Variable Selection into Heat Vulnerability Index Design

Qu, S.; Sillmann, J.; Barrett, B. W.; Graffy, P. M.; Poschlod, B.; Brunner, L.; Mansour, R.; Szombathely, M. v.; Hay-Chapman, F.; Horton, T. H.; Chan, J.; Rao, S. K.; Woods, K.; Kho, A. N.; Horton, D. E.

2026-03-31 public and global health
10.64898/2026.03.29.26349672 medRxiv
Show abstract

As climate change intensifies, health risks from extreme heat are rising. Accurate assessment of heat vulnerability at high spatial resolution is crucial for developing effective adaptation strategies, particularly in socioeconomically heterogeneous urban settings. However, the identification of key indicators underlying heat vulnerability remains challenging. Using Chicago, Illinois (USA) as a case study, we systematically compare different variable selection strategies in community-level heat vulnerability assessments. We take the conventional unsupervised principal component analysis (PCA)-based Heat Vulnerability Index (HVI) as a baseline, and compare it with supervised approaches that incorporate variable selection, including machine learning algorithms (Lasso regression, Random Forest, and XGBoost) as well as traditional statistical methods (simple linear regression and polynomial regression). Using the vulnerability indicator subsets identified by each variable selection method, we construct multiple HVIs and evaluate their performance against heat-related excess mortality. Our work indicates that supervised variable selection improves the performance of HVIs in capturing heat-related health risks. Among all methods, the Random Forest-based variable selection algorithm achieves the best overall results, highlighting the potential of machine learning to enhance heat vulnerability assessment tools. Our results demonstrate that poverty rate, lack of air conditioning, and proportion of residents aged 65 and above are robust determinants of heat vulnerability in Chicago.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Science of The Total Environment
179 papers in training set
Top 0.4%
14.8%
2
PLOS ONE
4510 papers in training set
Top 13%
14.5%
3
PLOS Global Public Health
293 papers in training set
Top 1.0%
9.2%
4
GeoHealth
10 papers in training set
Top 0.1%
7.2%
5
Scientific Reports
3102 papers in training set
Top 14%
6.9%
50% of probability mass above
6
International Journal of Environmental Research and Public Health
124 papers in training set
Top 0.9%
6.4%
7
Environment International
42 papers in training set
Top 0.4%
4.0%
8
Frontiers in Public Health
140 papers in training set
Top 2%
4.0%
9
Environmental Research
46 papers in training set
Top 0.6%
2.5%
10
Environmental Pollution
35 papers in training set
Top 1%
1.9%
11
Physics of Fluids
13 papers in training set
Top 0.1%
1.8%
12
Indoor Air
10 papers in training set
Top 0.1%
1.8%
13
Environmental Science & Technology
64 papers in training set
Top 2%
1.3%
14
Epidemics
104 papers in training set
Top 1%
1.0%
15
Spatial and Spatio-temporal Epidemiology
10 papers in training set
Top 0.1%
1.0%
16
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.9%
17
Infectious Diseases of Poverty
10 papers in training set
Top 0.3%
0.9%
18
Journal of the American Heart Association
119 papers in training set
Top 4%
0.9%
19
BMC Public Health
147 papers in training set
Top 5%
0.8%
20
COVID
13 papers in training set
Top 0.3%
0.8%
21
BMJ Open
554 papers in training set
Top 13%
0.8%
22
BMJ Global Health
98 papers in training set
Top 3%
0.8%
23
Journal of The Royal Society Interface
189 papers in training set
Top 5%
0.7%
24
Nature Communications
4913 papers in training set
Top 64%
0.7%
25
Global Change Biology
69 papers in training set
Top 2%
0.5%
26
Journal of Environmental Management
11 papers in training set
Top 1%
0.5%