Back

Identifying Key Predictive Features for Opioid Use Disorder Using Machine Learning

Akhter, S.; Miller, J. H.

2025-07-15 health informatics
10.1101/2025.07.12.25331446 medRxiv
Show abstract

BackgroundOpioid Use Disorder (OUD) continues to pose a pressing public health challenge across the United States, highlighting the critical need for early and accurate risk assessment tools that facilitate prompt prevention and intervention efforts. Machine learning methods have emerged as valuable tools for parsing complex medical datasets and aiding in clinical decisions. However, their effectiveness and interpretability largely rely on the appropriateness and quality of selected input features. ObjectiveIn this work, we conducted a comprehensive comparison of three distinct feature selection strategies--Alternating Decision Tree (ADT)-based scoring, Cross-Validated Feature Evaluation (CVFE), and Hypergraph-Based Feature Evaluation (HFE)-- to identify the most predictive indicators of OUD. MethodsThe analysis was performed using data from the 2023 National Survey on Drug Use and Health (NSDUH), a dataset compiled by RTI International under the direction of the Substance Abuse and Mental Health Services Administration (SAMHSA). This dataset encompasses a broad spectrum of features related to demographics, behavior, mental health, and substance usage. Each feature selection method yielded a set of important predictors, which were subsequently used to train eXtreme Gradient Boosting (XGBoost) classification models. To enhance model transparency and interpretability, SHapley Additive exPlanations (SHAP) was employed to illustrate the influence of individual variables on model predictions. ResultsThe performance of the models was evaluated and compared, with the model informed by CVFE-selected features achieving the best outcomes--demonstrating a predictive accuracy of 79.11% and an area under the curve (AUC) of 0.8652. The top 10 most influential features, based on SHAP value rankings from the best-performing model, included past-year misuse of pain relievers, recent alcohol use disorder, age group, history of asthma, receipt of substance use treatment in the past year, educational attainment, household size, total household income, marital status, and race/ethnicity. The web application, accessible via https://shiny.tricities.wsu.edu/oud-prediction/, offers prediction outcomes, probability metrics, and a SHAP visualization generated from the best model built using cross-validation-based approach. ConclusionsThe findings highlight the crucial importance of effective feature selection in enhancing both model accuracy and interpretability, ultimately supporting the development of practical, data-driven approaches that may help healthcare providers assess OUD risk and tailor prevention strategies to individual needs. Trial registrationNot applicable as this research is not a clinical trial.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
International Journal of Medical Informatics
25 papers in training set
Top 0.1%
21.7%
2
JAMIA Open
37 papers in training set
Top 0.1%
6.9%
3
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
6.1%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
6.1%
5
npj Digital Medicine
97 papers in training set
Top 0.8%
6.1%
6
JMIR Medical Informatics
17 papers in training set
Top 0.2%
4.4%
50% of probability mass above
7
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.7%
4.0%
8
JMIR Public Health and Surveillance
45 papers in training set
Top 0.5%
3.8%
9
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.5%
10
PLOS ONE
4510 papers in training set
Top 41%
3.5%
11
International Journal of Drug Policy
11 papers in training set
Top 0.1%
3.5%
12
PLOS Digital Health
91 papers in training set
Top 0.8%
3.1%
13
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.0%
14
Journal of Biomedical Informatics
45 papers in training set
Top 0.6%
2.6%
15
Scientific Reports
3102 papers in training set
Top 60%
1.6%
16
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.6%
17
JMIR Formative Research
32 papers in training set
Top 1%
1.2%
18
Frontiers in Public Health
140 papers in training set
Top 6%
1.1%
19
Drug and Alcohol Dependence
37 papers in training set
Top 0.5%
0.9%
20
Journal of Affective Disorders
81 papers in training set
Top 1%
0.9%
21
BJPsych Open
25 papers in training set
Top 0.7%
0.8%
22
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
23
Heliyon
146 papers in training set
Top 7%
0.7%
24
BMC Health Services Research
42 papers in training set
Top 2%
0.7%
25
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.7%
26
Psychiatry Research
35 papers in training set
Top 2%
0.6%
27
Statistics in Medicine
34 papers in training set
Top 0.4%
0.6%
28
International Journal of Environmental Research and Public Health
124 papers in training set
Top 8%
0.6%
29
Nicotine and Tobacco Research
13 papers in training set
Top 0.3%
0.6%