Social Determinants of Health and Chronic Disease Risk Prediction in the All of Us Research Program
Kammer-Kerwick, M.; Dave, Y.; Parekh, V.; McDonald, L.; Watkins, S. C.
Show abstract
Social determinants of health (SDoH), the social, economic, and environmental conditions shaping health trajectories, contribute to chronic disease risk comparably to clinical factors, yet most predictive studies model conditions independently, obscuring shared social pathways. Using participant-reported data from the All of Us Research Program (n=259,186), we evaluated the relative contributions of demographic factors and twelve SDoH domains to chronic disease prediction while accounting for the co-occurrence structure of conditions. Hierarchical clustering identified two clinically meaningful outcome clusters: a Mental Health cluster (depression, anxiety, substance use disorder; prevalence = 51.7%) and a Cardiometabolic cluster (heart disease, diabetes, chronic lung disease; prevalence = 78.7%). Gradient boosted models were trained for each cluster under three feature configurations, SDoH only, demographics only, and combined, with performance evaluated using bootstrapped area under the receiver operating characteristic curve (AUC). Combined models achieved the highest discriminative performance for Mental Health (AUC = 0.701, 95% confidence interval: 0.696 - 0.705) and Cardiometabolic (AUC = 0.662, 95% CI: 0.655 - 0.668) outcomes. SDoH features outperformed demographics for Mental Health prediction (AUC = 0.678 vs. 0.655), while performance was comparable for Cardiometabolic outcomes (SDoH = 0.633; demographics = 0.636). Interpretability analysis using SHapley Additive exPlanations (SHAP) identified stress, discrimination, and religion/spirituality as the most influential SDoH domains for Mental Health outcomes; age, neighborhood disorder, and discrimination were primary predictors for Cardiometabolic outcomes. Double machine learning confirmed significant causal effects, with stress showing the largest average treatment effect on Mental Health outcomes (ATE = 0.093, p < 0.001). Interaction analyses revealed 24 significant SDoH-by-demographic interactions, indicating differential SDoH effects across racial/ethnic and gender/sexual minority subgroups. These findings indicate that experiential social factors carry stronger predictive signal for mental health conditions, while Cardiometabolic conditions are more strongly shaped by demographic and structural neighborhood characteristics. Results support condition-specific SDoH screening protocols over universal instruments and targeted social interventions to reduce health disparities. Author SummaryWe developed and tested a four-stage analytical framework to predict chronic disease risk more precisely by combining individual Social Determinants of Health (ones social environments, stress levels, neighborhood conditions, and community connections), with conventional patient demographics such as age, income, and race/ethnicity. Using data from nearly 260,000 participants in the All of Us Research Program, we found that including social and environmental factors meaningfully improve prediction of both mental health conditions (depression, anxiety, and substance use) and cardiometabolic conditions (heart disease, diabetes, and lung disease). Importantly, not all social factors matter equally for all conditions. Mental health outcomes were most strongly shaped by experiential factors (stress, discrimination, and loneliness) while cardiometabolic outcomes were more strongly driven by age and neighborhood characteristics such as disorder and limited access to physical activity. We also found that stress, discrimination, and neighborhood disadvantage have stronger health effects among Black, Hispanic, and gender/sexual minority individuals, pointing to where targeted interventions could reduce persistent health disparities. These findings suggest that clinicians and health systems should move away from one-size-fits-all social needs screening toward condition-specific tools that prioritize the social factors most relevant to the conditions being managed.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.