Back

Exposome-Based Clustering of Urinary VOC and PAH Biomarkers Reveals Racially Patterned Cardiovascular Risk in a Nationally Representative US Cohort: A Machine Learning Analysis of NHANES 2017-2018

Anthonio, O. G.; Olowu, B. I.; Olawuyi, D. A.; Aderemi, T. V.; Ajayi, O. J.

2026-04-27 cardiovascular medicine
10.64898/2026.04.19.26351113 medRxiv
Show abstract

Background Polycyclic aromatic hydrocarbons (PAHs) and volatile organic compounds (VOCs) are combustion-derived pollutants linked to cardiovascular disease. Prior NHANES analyses have evaluated these chemicals individually, failing to capture the correlated co-exposure structures that characterize real-world environmental burden, thereby underscoring the need for application. In this study, we applied an unsupervised machine learning pipeline to urinary biomarker data to identify multi-chemical exposure clusters and quantify their differential cardiovascular risk profiles in a nationally representative US sample. Methods We analyzed 2,979 participants from NHANES between 2017-2018, representing an estimated 36.8 million US adults after complex survey weighting. Twenty-five urinary biomarkers (6 PAH, 19 VOC metabolites) were log-transformed, imputed using Multivariate Imputation by Chained Equations (MICE), and standardized. Uniform Manifold Approximation and Projection (UMAP) was used for dimensionality reduction, followed by Gaussian Mixture Model (GMM) clustering. Survey-weighted prevalence estimates with 95% confidence intervals (CIs) were calculated for hypertension and high total cholesterol within each cluster. Weighted multivariable logistic regression was used to estimate odds ratios (OR) for hypertension, adjusting for age, sex, race/ethnicity, and income. Results Four exposure clusters were identified with a mean assignment probability of 0.948. The High combustion cluster (n=370; estimated 5.1 million US adults) exhibited the highest multi-chemical burden and a weighted hypertension prevalence of 39.3% (95% CI 37.2-41.4%), compared to 28.7% (95% CI 21.9-35.5%) in the Low exposure reference group. After demographic adjustment, High combustion cluster membership was independently associated with 38.4% higher odds of prevalent hypertension (OR 1.38). The prediction model achieved a cross-validated area under the receiver operating characteristic curve (AUC) of 0.849 (SD 0.017). Non-Hispanic Black participants constituted approximately 40% of the High combustion cluster, exceeding their representation in lower-risk clusters. Conclusions Multi-chemical exposome profiling identifies four cardiovascularly distinct subpopulations in the US adult population. Membership in the High combustion exposure cluster was associated with higher odds of prevalent hypertension and disproportionately affected Non-Hispanic Black participants. These findings support the use of multichemical approaches over single-pollutant analyses and highlight the relevance of environmental exposure patterns for making policy and targeted cardiovascular risk stratification.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Environment International
42 papers in training set
Top 0.1%
14.8%
2
Toxicology and Applied Pharmacology
13 papers in training set
Top 0.1%
14.5%
3
Journal of the American Heart Association
119 papers in training set
Top 0.6%
10.2%
4
Scientific Reports
3102 papers in training set
Top 17%
6.4%
5
Environmental Health Perspectives
17 papers in training set
Top 0.1%
6.4%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 39%
3.6%
7
Environmental Science & Technology
64 papers in training set
Top 0.9%
3.3%
8
The Lancet Digital Health
25 papers in training set
Top 0.2%
3.1%
9
Circulation
66 papers in training set
Top 1%
3.1%
10
PNAS Nexus
147 papers in training set
Top 0.2%
1.9%
11
PLOS Global Public Health
293 papers in training set
Top 3%
1.8%
12
Canadian Medical Association Journal
15 papers in training set
Top 0.1%
1.7%
13
Nature Communications
4913 papers in training set
Top 52%
1.7%
14
Science of The Total Environment
179 papers in training set
Top 3%
1.5%
15
Arteriosclerosis, Thrombosis, and Vascular Biology
65 papers in training set
Top 1%
1.5%
16
International Journal of Environmental Research and Public Health
124 papers in training set
Top 5%
1.2%
17
BMC Medicine
163 papers in training set
Top 5%
1.0%
18
European Respiratory Journal
54 papers in training set
Top 1%
1.0%
19
Environmental Pollution
35 papers in training set
Top 2%
0.9%
20
Genome Medicine
154 papers in training set
Top 7%
0.9%
21
Kidney360
22 papers in training set
Top 0.5%
0.8%
22
Environmental Research
46 papers in training set
Top 2%
0.8%
23
The Journal of Infectious Diseases
182 papers in training set
Top 5%
0.8%
24
American Journal of Epidemiology
57 papers in training set
Top 2%
0.7%
25
eLife
5422 papers in training set
Top 59%
0.7%
26
eBioMedicine
130 papers in training set
Top 5%
0.7%
27
International Journal of Epidemiology
74 papers in training set
Top 3%
0.7%
28
The Innovation
12 papers in training set
Top 1%
0.6%
29
Respiratory Research
19 papers in training set
Top 0.6%
0.6%
30
Frontiers in Neurology
91 papers in training set
Top 6%
0.6%