Back

Classification of Adolescent Drinking via Behavioral, Biological, and Environmental Features: A Machine Learning Approach with Bias Control

Liu, R.; Azzam, M.; Zabik, N.; Wan, S.; Blackford, J.; Wang, J.

2026-02-26 addiction medicine
10.64898/2026.02.24.26347002 medRxiv
Show abstract

In 2024, approximately 30% of U.S. adolescents reported having consumed alcohol at least once in their lifetime, with about 25% of these individuals engaging in binge drinking. Adolescent alcohol use is associated with neurodevelopmental impairments, elevated risk of later alcohol use, and mental health disorders. These findings underscore the importance of identifying the variables driving adolescent alcohol use and leveraging them for early identification and targeted intervention. Previous studies have typically developed machine-learning classification models that use neuroimaging data in combination with limited clinical measurements. Neuroimaging data are expensive and difficult to obtain at scale, whereas clinical measures are more practical for large-scale screening due to their low cost and widespread accessibility. However, clinical-only approaches for alcohol drinking classification remain largely underexplored. Furthermore, prior studies have often focused on adults, limiting generalizability to the broader adolescent population. Additionally, confounding factors such as age and substance use, which are strongly correlated with alcohol consumption, have often been inadequately addressed, potentially inflating classification performance. Finally, class imbalance remains a persistent challenge, with prior attempts yielding only limited improvements. To address these limitations, we propose FocalTab, a framework that integrates TabPFN with focal loss for robust generalization and effective mitigation of class imbalance. The approach also incorporates an initial preprocessing step to remove confounding factors to account for age and substance-use. We compare FocalTab against state-of-the-art methods across different variable selections and dataset settings. FocalTab achieves the highest accuracy (84.3%) and specificity (80.0%) in the most stringent setting, in which both age and substance use variables were excluded, whereas competing models drop to near-chance specificity (12-24%). We further applied SHapley Additive exPlanations (SHAP) analysis to identify key clinical predictors of drinker classification, supporting enhanced screening and early intervention.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
Human Brain Mapping
295 papers in training set
Top 0.5%
10.7%
2
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
8.7%
3
Statistics in Medicine
34 papers in training set
Top 0.1%
5.0%
4
JAMA Network Open
127 papers in training set
Top 0.6%
4.4%
5
Frontiers in Psychiatry
83 papers in training set
Top 0.9%
4.1%
6
Translational Psychiatry
219 papers in training set
Top 1%
4.1%
7
Computational Psychiatry
12 papers in training set
Top 0.1%
3.8%
8
Drug and Alcohol Dependence
37 papers in training set
Top 0.2%
3.7%
9
Scientific Reports
3102 papers in training set
Top 43%
2.8%
10
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 0.6%
2.7%
11
Neuropsychopharmacology
134 papers in training set
Top 1%
2.7%
50% of probability mass above
12
Nature Communications
4913 papers in training set
Top 46%
2.1%
13
Addiction Neuroscience
17 papers in training set
Top 0.2%
2.1%
14
Addiction Biology
47 papers in training set
Top 0.5%
1.9%
15
Nature Human Behaviour
85 papers in training set
Top 2%
1.9%
16
PLOS ONE
4510 papers in training set
Top 49%
1.9%
17
Molecular Psychiatry
242 papers in training set
Top 2%
1.9%
18
Biological Psychiatry
119 papers in training set
Top 2%
1.7%
19
Nature Mental Health
18 papers in training set
Top 0.1%
1.7%
20
npj Digital Medicine
97 papers in training set
Top 2%
1.5%
21
Nature Medicine
117 papers in training set
Top 2%
1.5%
22
Addiction
25 papers in training set
Top 0.3%
1.4%
23
PLOS Digital Health
91 papers in training set
Top 2%
1.4%
24
Bioinformatics
1061 papers in training set
Top 8%
1.4%
25
Biological Psychiatry Global Open Science
54 papers in training set
Top 1%
1.0%
26
Communications Medicine
85 papers in training set
Top 0.7%
0.9%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
28
Patterns
70 papers in training set
Top 2%
0.8%
29
Cerebral Cortex
357 papers in training set
Top 2%
0.8%
30
eLife
5422 papers in training set
Top 56%
0.8%