Back

Social Determinants of Health and Chronic Disease Risk Prediction in the All of Us Research Program

Kammer-Kerwick, M.; Dave, Y.; Parekh, V.; McDonald, L.; Watkins, S. C.

2026-03-23 health informatics
10.64898/2026.03.19.26348851 medRxiv
Show abstract

Social determinants of health (SDoH), the social, economic, and environmental conditions shaping health trajectories, contribute to chronic disease risk comparably to clinical factors, yet most predictive studies model conditions independently, obscuring shared social pathways. Using participant-reported data from the All of Us Research Program (n=259,186), we evaluated the relative contributions of demographic factors and twelve SDoH domains to chronic disease prediction while accounting for the co-occurrence structure of conditions. Hierarchical clustering identified two clinically meaningful outcome clusters: a Mental Health cluster (depression, anxiety, substance use disorder; prevalence = 51.7%) and a Cardiometabolic cluster (heart disease, diabetes, chronic lung disease; prevalence = 78.7%). Gradient boosted models were trained for each cluster under three feature configurations, SDoH only, demographics only, and combined, with performance evaluated using bootstrapped area under the receiver operating characteristic curve (AUC). Combined models achieved the highest discriminative performance for Mental Health (AUC = 0.701, 95% confidence interval: 0.696 - 0.705) and Cardiometabolic (AUC = 0.662, 95% CI: 0.655 - 0.668) outcomes. SDoH features outperformed demographics for Mental Health prediction (AUC = 0.678 vs. 0.655), while performance was comparable for Cardiometabolic outcomes (SDoH = 0.633; demographics = 0.636). Interpretability analysis using SHapley Additive exPlanations (SHAP) identified stress, discrimination, and religion/spirituality as the most influential SDoH domains for Mental Health outcomes; age, neighborhood disorder, and discrimination were primary predictors for Cardiometabolic outcomes. Double machine learning confirmed significant causal effects, with stress showing the largest average treatment effect on Mental Health outcomes (ATE = 0.093, p < 0.001). Interaction analyses revealed 24 significant SDoH-by-demographic interactions, indicating differential SDoH effects across racial/ethnic and gender/sexual minority subgroups. These findings indicate that experiential social factors carry stronger predictive signal for mental health conditions, while Cardiometabolic conditions are more strongly shaped by demographic and structural neighborhood characteristics. Results support condition-specific SDoH screening protocols over universal instruments and targeted social interventions to reduce health disparities. Author SummaryWe developed and tested a four-stage analytical framework to predict chronic disease risk more precisely by combining individual Social Determinants of Health (ones social environments, stress levels, neighborhood conditions, and community connections), with conventional patient demographics such as age, income, and race/ethnicity. Using data from nearly 260,000 participants in the All of Us Research Program, we found that including social and environmental factors meaningfully improve prediction of both mental health conditions (depression, anxiety, and substance use) and cardiometabolic conditions (heart disease, diabetes, and lung disease). Importantly, not all social factors matter equally for all conditions. Mental health outcomes were most strongly shaped by experiential factors (stress, discrimination, and loneliness) while cardiometabolic outcomes were more strongly driven by age and neighborhood characteristics such as disorder and limited access to physical activity. We also found that stress, discrimination, and neighborhood disadvantage have stronger health effects among Black, Hispanic, and gender/sexual minority individuals, pointing to where targeted interventions could reduce persistent health disparities. These findings suggest that clinicians and health systems should move away from one-size-fits-all social needs screening toward condition-specific tools that prioritize the social factors most relevant to the conditions being managed.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
18.3%
2
Scientific Reports
3102 papers in training set
Top 10%
8.3%
3
PLOS Digital Health
91 papers in training set
Top 0.5%
4.8%
4
Journal of the American Heart Association
119 papers in training set
Top 2%
3.9%
5
BMC Medicine
163 papers in training set
Top 1%
3.8%
6
PLOS ONE
4510 papers in training set
Top 40%
3.5%
7
The Lancet Digital Health
25 papers in training set
Top 0.1%
3.5%
8
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
9
Nature Communications
4913 papers in training set
Top 43%
3.0%
50% of probability mass above
10
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.9%
11
Biological Psychiatry
119 papers in training set
Top 1%
1.9%
12
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.9%
13
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.8%
14
PNAS Nexus
147 papers in training set
Top 0.3%
1.7%
15
Frontiers in Public Health
140 papers in training set
Top 5%
1.7%
16
Social Science & Medicine
15 papers in training set
Top 0.5%
1.6%
17
JMIR Medical Informatics
17 papers in training set
Top 0.8%
1.6%
18
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.5%
19
JAMIA Open
37 papers in training set
Top 1%
1.3%
20
JAMA Network Open
127 papers in training set
Top 3%
1.2%
21
eClinicalMedicine
55 papers in training set
Top 1%
1.2%
22
SSM - Population Health
17 papers in training set
Top 0.3%
1.2%
23
BMJ Health & Care Informatics
13 papers in training set
Top 0.7%
0.9%
24
BMC Public Health
147 papers in training set
Top 5%
0.9%
25
International Journal of Environmental Research and Public Health
124 papers in training set
Top 6%
0.9%
26
Patterns
70 papers in training set
Top 2%
0.9%
27
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.9%
28
Science Advances
1098 papers in training set
Top 28%
0.8%
29
Annals of Internal Medicine
27 papers in training set
Top 0.9%
0.8%
30
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 3%
0.7%