Explainable machine learning for revisiting reported Irritable Bowel Syndrome correlates in a student cohort
Ramirez-Lopez, L.; Kang, P.
Show abstract
Irritable Bowel Syndrome (IBS) affects a substantial proportion of university students, yet its factors remain incompletely characterised in South Asian populations. We reanalysed a publicly available dataset of 550 Bangladeshi students from Hasan et al. (2025), conducting a data audit that identified implausible records, including males reporting menstrual symptoms, and reduced the analytic sample to 506 observations. Using Explainable Boosting Machines (EBMs), which capture non-linear effects and pairwise interactions without sacrificing interpretability, we found that psychological distress, elevated BMI and academic dissatisfaction were the strongest predictors of IBS (mean AUC = 0.852 across 100 stratified train-test splits). Critically, several findings diverged from the original logistic regression analysis. Physical activity showed a non-linear risk pattern only at high intensity, the association with gender was substantially weaker when we accounted for metabolic and psychological factors as well and malnourishment does not have a strong an impact as in the original study. These divergences likely arise because the machine-learning model captures non-linear effects and interactions that were not represented in the original regression specification. Our findings underscore the value of reanalysing existing datasets with methods suited to capturing complexity and highlight data quality verification as a necessary step in the secondary analysis.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.