Back

A Machine Learning Based Causal Interface for Time-Varying Environmental Predictors of Substance Use Initiation in the ABCD Study

Wei, M.; Yadlapati, L.; Peng, Q.

2026-04-17 addiction medicine
10.64898/2026.04.15.26350988 medRxiv
Show abstract

Background: The Adolescent Brain Cognitive Development (ABCD) Study provides rich longitudinal data on environmental, genetic, and behavioral factors related to substance use initiation. Classical marginal structural models (MSMs) require selecting covariates for propensity models, which is challenging when there are many correlated predictors. Methods: We analyzed longitudinal panel data from 11,868 ABCD participants with repeated observations over time. Interval-level binary outcomes were defined for initiation of alcohol, nicotine, cannabis, and any substance, including only participants at risk before initiation. All predictors were constructed as lagged variables to preserve temporal ordering. We used a two-stage machine learning-based causal framework. First, we performed graph discovery using a Granger-inspired lagged predictive modeling approach with elastic-net logistic regression to identify relationships between past predictors and future outcomes. Stable candidate edges were selected using subject-level bootstrap stability selection. Second, we estimated adjusted effects for stable predictors using double machine learning (DML) with partialling-out and cross-fitting. For each predictor, the lagged variable was treated as the exposure and adjusted for high-dimensional lagged covariates. Cross-fitting with group-based splitting accounted for within-subject dependence. Nuisance functions were estimated using random forests, and cluster-robust standard errors were used for inference. Results: We identified stable predictors across multiple domains, including sleep patterns, family environment, peer relationships, behavioral traits, and genetic risk. Many predictors were shared across substance outcomes, while some were outcome-specific. Effect sizes were modest, typically ranging from -0.01 to 0.02 per standard deviation increase in the predictor. Both risk-increasing and protective associations were observed. Risk factors included sleep disturbance and behavioral risk indicators, while protective factors included parental monitoring and structured environments. Conclusions: This study presents a practical framework for analyzing high-dimensional longitudinal data and identifying time-varying predictors of substance use initiation. The approach combines machine learning for variable selection with causal inference for effect estimation. The results highlight both shared and outcome-specific risk factors and identify modifiable targets, such as family environment and sleep, that may inform prevention strategies.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Drug and Alcohol Dependence
37 papers in training set
Top 0.1%
17.3%
2
Developmental Cognitive Neuroscience
81 papers in training set
Top 0.1%
8.3%
3
Frontiers in Psychiatry
83 papers in training set
Top 0.5%
6.7%
4
Human Brain Mapping
295 papers in training set
Top 1%
6.3%
5
JAMA Network Open
127 papers in training set
Top 0.4%
6.3%
6
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 0.3%
4.8%
7
Statistics in Medicine
34 papers in training set
Top 0.1%
3.9%
50% of probability mass above
8
Biological Psychiatry Global Open Science
54 papers in training set
Top 0.2%
3.6%
9
Addiction
25 papers in training set
Top 0.2%
3.6%
10
International Journal of Drug Policy
11 papers in training set
Top 0.1%
2.6%
11
Computational Psychiatry
12 papers in training set
Top 0.1%
2.1%
12
Journal of Child Psychology and Psychiatry
25 papers in training set
Top 0.2%
2.1%
13
PLOS ONE
4510 papers in training set
Top 49%
2.1%
14
Biological Psychiatry
119 papers in training set
Top 1%
1.9%
15
Addiction Biology
47 papers in training set
Top 0.5%
1.9%
16
The British Journal of Psychiatry
21 papers in training set
Top 0.5%
1.8%
17
Neuropsychopharmacology
134 papers in training set
Top 1%
1.8%
18
American Journal of Epidemiology
57 papers in training set
Top 0.7%
1.7%
19
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.7%
20
Translational Psychiatry
219 papers in training set
Top 3%
1.5%
21
European Journal of Epidemiology
40 papers in training set
Top 0.4%
1.5%
22
Scientific Reports
3102 papers in training set
Top 67%
1.2%
23
Genes, Brain and Behavior
29 papers in training set
Top 0.2%
1.1%
24
Behavior Genetics
15 papers in training set
Top 0.1%
0.9%
25
The Lancet Public Health
20 papers in training set
Top 0.6%
0.8%
26
European Child & Adolescent Psychiatry
14 papers in training set
Top 0.4%
0.8%
27
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
28
Communications Biology
886 papers in training set
Top 24%
0.7%
29
PLOS Medicine
98 papers in training set
Top 5%
0.7%
30
SLEEP
28 papers in training set
Top 0.4%
0.7%