Back

A Machine Learning Based Causal Interface for Time-Varying Environmental Predictors of Substance Use Initiation in the ABCD Study

Wei, M.; Yadlapati, L.; Peng, Q.

2026-04-17 addiction medicine
10.64898/2026.04.15.26350988 medRxiv
Show abstract

BackgroundThe Adolescent Brain Cognitive Development (ABCD) Study(R) offers rich longitudinal data on environmental, genetic, and other factors related to substance use initiation. Classical marginal structural models (MSMs) require selecting covariates for propensity models, which is challenging in the presence of hundreds of correlated predictors. MethodsWe analyzed longitudinal panel data from 11,868 ABCD participants, where each individual contributed repeated observations over time. Interval-level binary outcomes were defined for initiation of alcohol, nicotine, cannabis, and any substance, restricting analyses to participants at risk prior to initiation. All predictors were constructed as lagged variables to preserve temporal ordering. We implemented a two-stage machine learning-based causal framework. First, we performed graph discovery using a Granger-inspired lagged predictive modeling approach, applying elastic-net logistic regression to identify predictive relationships between lagged environmental variables and future initiation outcomes. Robust candidate edges were selected using subject-level bootstrap stability selection. Second, we estimated adjusted effect sizes for stable edges using double machine learning (DML)-style partialling-out with cross-fitting. For each candidate predictor, the treatment was defined as the lagged variable of interest and adjusted for high-dimensional lagged covariates. Cross-fitting with group-based splitting accounted for within-subject dependence, and nuisance functions were estimated using random forest models. Cluster-robust standard errors were used for inference. ResultsWe identified a set of stable predictors across multiple domains, including sleep patterns, family environment, peer relationships, behavioral traits, and genetic risk. Many predictors were shared across substance outcomes, while some were outcome-specific. Estimated effect sizes were modest, typically ranging from -0.01 to 0.02 per standard deviation increase in the predictor. Both risk-increasing and protective associations were observed. Risk factors included sleep disturbance and behavioral risk indicators, while protective factors included parental monitoring and structured environments. ConclusionsThis study provides a practical framework for analyzing high-dimensional longitudinal data and identifying time-varying predictors of substance use initiation. The approach combines machine learning for variable selection with causal inference methods for effect estimation. The results highlight both shared and substance-specific risk factors and identify modifiable targets, such as family environment and sleep, that may inform prevention strategies.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Drug and Alcohol Dependence
37 papers in training set
Top 0.1%
14.5%
2
Developmental Cognitive Neuroscience
81 papers in training set
Top 0.1%
6.9%
3
Human Brain Mapping
295 papers in training set
Top 0.9%
6.9%
4
Frontiers in Psychiatry
83 papers in training set
Top 0.5%
6.9%
5
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 0.2%
6.4%
6
JAMA Network Open
127 papers in training set
Top 0.4%
6.4%
7
Statistics in Medicine
34 papers in training set
Top 0.1%
4.3%
50% of probability mass above
8
Biological Psychiatry Global Open Science
54 papers in training set
Top 0.2%
3.6%
9
Addiction
25 papers in training set
Top 0.2%
3.6%
10
Computational Psychiatry
12 papers in training set
Top 0.1%
2.1%
11
Biological Psychiatry
119 papers in training set
Top 1%
2.1%
12
International Journal of Drug Policy
11 papers in training set
Top 0.2%
2.1%
13
Neuropsychopharmacology
134 papers in training set
Top 1%
2.1%
14
Journal of Child Psychology and Psychiatry
25 papers in training set
Top 0.2%
2.1%
15
The British Journal of Psychiatry
21 papers in training set
Top 0.4%
2.1%
16
PLOS ONE
4510 papers in training set
Top 49%
1.9%
17
European Journal of Epidemiology
40 papers in training set
Top 0.3%
1.7%
18
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.7%
19
Addiction Biology
47 papers in training set
Top 0.5%
1.7%
20
American Journal of Epidemiology
57 papers in training set
Top 0.8%
1.5%
21
The Lancet Public Health
20 papers in training set
Top 0.4%
1.2%
22
Translational Psychiatry
219 papers in training set
Top 3%
1.2%
23
Scientific Reports
3102 papers in training set
Top 68%
1.0%
24
Genes, Brain and Behavior
29 papers in training set
Top 0.3%
0.9%
25
BMC Medicine
163 papers in training set
Top 6%
0.9%
26
PLOS Digital Health
91 papers in training set
Top 2%
0.8%
27
European Child & Adolescent Psychiatry
14 papers in training set
Top 0.4%
0.8%
28
Nature Mental Health
18 papers in training set
Top 0.3%
0.8%
29
SLEEP
28 papers in training set
Top 0.4%
0.7%
30
PLOS Genetics
756 papers in training set
Top 17%
0.6%