Back

Predicting Traffic Accident Injury Severity Using Ensemble Machine Learning Models: Incident Level and Generalized Insights via Explainable AI

Zhang, E. R.; Mermer, O.; Demir, I.

2026-04-20 occupational and environmental health
10.64898/2026.04.13.26350778 medRxiv
Show abstract

Road traffic accidents represent a global public safety crisis, necessitating advanced computational tools for accurate injury severity prediction and effective decision support. This study evaluates high-performing ensemble machine learning models, including AdaBoost, XGBoost, LightGBM, HistGBRT, CatBoost, Gradient Boosting, NGBoost, and Random Forest, using a comprehensive National Highway Traffic Safety Administration (NHTSA) dataset from 2018 to 2022. While all models demonstrated exceptional predictive accuracy, with HistGBRT achieving the highest overall accuracy of 92.26%, a defining achievement of this work is the perfect classification (100% precision and recall) of fatal injuries across all ensemble architectures. To bridge the gap between predictive performance and actionable intelligence, this research integrates SHapley Additive exPlanations (SHAP) to provide both global insights into dataset-wide risk factors and local, instance-specific rationales for individual crash events. The global analysis identified ethnicity, airbag deployment, and harmful event type as primary drivers of injury severity, while local force and waterfall plots revealed the precise "push and pull" of variables for specific incidents. The results offer a robust, interpretable framework for stakeholders tasked with improving traffic safety and mitigating crash-related harm.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.7%
19.1%
2
PLOS ONE
4510 papers in training set
Top 18%
10.4%
3
Bioengineering
24 papers in training set
Top 0.1%
8.6%
4
International Journal of Environmental Research and Public Health
124 papers in training set
Top 0.6%
7.4%
5
Heliyon
146 papers in training set
Top 0.1%
5.0%
50% of probability mass above
6
Communications Biology
886 papers in training set
Top 0.8%
4.5%
7
Chaos, Solitons & Fractals
32 papers in training set
Top 0.6%
3.3%
8
PLOS Global Public Health
293 papers in training set
Top 3%
2.4%
9
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 0.9%
2.4%
10
Environmental Research Letters
15 papers in training set
Top 0.2%
2.4%
11
Frontiers in Public Health
140 papers in training set
Top 3%
2.1%
12
Science of The Total Environment
179 papers in training set
Top 3%
1.7%
13
Sensors
39 papers in training set
Top 0.9%
1.7%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
15
PLOS Computational Biology
1633 papers in training set
Top 17%
1.5%
16
The Innovation
12 papers in training set
Top 0.4%
1.5%
17
Expert Systems with Applications
11 papers in training set
Top 0.2%
1.3%
18
Journal of Medical Internet Research
85 papers in training set
Top 4%
1.0%
19
Patterns
70 papers in training set
Top 2%
1.0%
20
Frontiers in Sports and Active Living
10 papers in training set
Top 0.3%
0.9%
21
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.9%
22
Frontiers in Computational Neuroscience
53 papers in training set
Top 2%
0.8%
23
Wellcome Open Research
57 papers in training set
Top 2%
0.8%
24
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
25
Frontiers in Immunology
586 papers in training set
Top 8%
0.7%
26
Nature
575 papers in training set
Top 16%
0.7%
27
ACS Nano
99 papers in training set
Top 5%
0.5%
28
BMC Public Health
147 papers in training set
Top 7%
0.5%