Back

A Combined Predictive and Causal Approach for Neighborhood-Level Diabetes Detection

Noaeen, M.; Rostami, A.; Ghanem, I.; Saarela, O.; Keshavjee, K.; Brook, J. R.; Shakeri, Z.

2025-03-05 endocrinology
10.1101/2025.02.28.25323125 medRxiv
Show abstract

ObjectiveDevelop a neighborhood-level framework using machine learning and causal inference to identify socioeconomic and behavioral drivers of Type 2 diabetes for targeted public health interventions. Materials and MethodsData from 1,149 Census Tracts in Toronto were integrated, linking demographic, health, and marginalization indices. Seven machine learning models classified neighborhoods with high diabetes prevalence. Feature engineering mitigated skewness and correlation, while Causal Forests estimated the Conditional Average Treatment Effect (CATE,{tau} ) for predictors such as work stress, smoking, and mental health. ResultsPredictive models achieved over 90% recall and high AUC metrics on both test and external validation datasets. Key predictors included obesity, overweight status, physical activity, and log-transformed median age. Causal analysis further indicated that elevated work stress ({tau} = 0.312) and daily smoking ({tau} = 0.155) increased diabetes risk, while stronger mental health ({tau} {approx} -1.1) was protective. DiscussionWhile genetic and clinical factors often dominate the conversation on diabetes, data is often restricted to confirmed diagnoses or not readily available for prevalence analyses. Our study shows how neighborhood contexts, including walkability, stress levels, and socioeconomic differences, help drive rising disease rates. We integrated machine learning classifiers with causal inference to examine how interventions, such as active transportation and adjusted work stress, could shift diabetes risk. ConclusionThis integrated method offers a blueprint for precision public health by clarifying how modifiable neighborhood factors affect diabetes risk. It can help tailor interventions to community needs and is applicable to other areas facing similar chronic disease challenges.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
JMIR Public Health and Surveillance
45 papers in training set
Top 0.1%
18.6%
2
PLOS Global Public Health
293 papers in training set
Top 2%
6.3%
3
BMC Medicine
163 papers in training set
Top 1%
4.0%
4
PLOS ONE
4510 papers in training set
Top 36%
4.0%
5
Nature Communications
4913 papers in training set
Top 37%
4.0%
6
BMJ Open Diabetes Research & Care
15 papers in training set
Top 0.3%
4.0%
7
JAMIA Open
37 papers in training set
Top 0.4%
4.0%
8
PLOS Digital Health
91 papers in training set
Top 0.7%
3.6%
9
International Journal of Environmental Research and Public Health
124 papers in training set
Top 2%
3.1%
50% of probability mass above
10
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.1%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1.0%
2.4%
12
The Journal of Clinical Endocrinology & Metabolism
35 papers in training set
Top 0.5%
2.4%
13
Scientific Reports
3102 papers in training set
Top 47%
2.4%
14
BMJ Open
554 papers in training set
Top 8%
2.1%
15
Journal of Affective Disorders
81 papers in training set
Top 0.9%
1.9%
16
Journal of Racial and Ethnic Health Disparities
11 papers in training set
Top 0.2%
1.7%
17
BMC Public Health
147 papers in training set
Top 4%
1.5%
18
PeerJ
261 papers in training set
Top 8%
1.5%
19
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.2%
20
Public Health Nutrition
14 papers in training set
Top 0.5%
1.1%
21
The Journal of Pediatrics
15 papers in training set
Top 0.5%
0.9%
22
BMC Medical Research Methodology
43 papers in training set
Top 1.0%
0.9%
23
Diabetes, Obesity and Metabolism
17 papers in training set
Top 0.4%
0.9%
24
International Journal of Behavioral Nutrition and Physical Activity
15 papers in training set
Top 0.4%
0.9%
25
Clinical Infectious Diseases
231 papers in training set
Top 4%
0.9%
26
JMIR Research Protocols
18 papers in training set
Top 1%
0.8%
27
Healthcare
16 papers in training set
Top 2%
0.8%
28
Human Brain Mapping
295 papers in training set
Top 4%
0.7%
29
European Journal of Public Health
20 papers in training set
Top 1%
0.7%
30
Human Molecular Genetics
130 papers in training set
Top 4%
0.7%