Predicting Traffic Accident Injury Severity Using Ensemble Machine Learning Models: Incident Level and Generalized Insights via Explainable AI

Zhang, E. R.; Mermer, O.; Demir, I.

2026-04-20 occupational and environmental health

10.64898/2026.04.13.26350778 medRxiv

Show abstract

Road traffic accidents represent a global public safety crisis, necessitating advanced computational tools for accurate injury severity prediction and effective decision support. This study evaluates high-performing ensemble machine learning models, including AdaBoost, XGBoost, LightGBM, HistGBRT, CatBoost, Gradient Boosting, NGBoost, and Random Forest, using a comprehensive National Highway Traffic Safety Administration (NHTSA) dataset from 2018 to 2022. While all models demonstrated exceptional predictive accuracy, with HistGBRT achieving the highest overall accuracy of 92.26%, a defining achievement of this work is the perfect classification (100% precision and recall) of fatal injuries across all ensemble architectures. To bridge the gap between predictive performance and actionable intelligence, this research integrates SHapley Additive exPlanations (SHAP) to provide both global insights into dataset-wide risk factors and local, instance-specific rationales for individual crash events. The global analysis identified ethnicity, airbag deployment, and harmful event type as primary drivers of injury severity, while local force and waterfall plots revealed the precise "push and pull" of variables for specific incidents. The results offer a robust, interpretable framework for stakeholders tasked with improving traffic safety and mitigating crash-related harm.

Predicting Traffic Accident Injury Severity Using Ensemble Machine Learning Models: Incident Level and Generalized Insights via Explainable AI

Matching journals