Explainable machine learning for revisiting reported Irritable Bowel Syndrome correlates in a student cohort
Ramirez-Lopez, L.; Kang, P.
Show abstract
Irritable Bowel Syndrome (IBS) affects a substantial proportion of university students, yet its factors remain incompletely characterised in South Asian populations. We reanalysed a publicly available dataset of 550 Bangladeshi students from Hasan et al. [1], conducting a data audit that identified implausible records, including males reporting menstrual symptoms, and reduced the analytic sample to 506 observations. Using Explainable Boosting Machines (EBMs), which capture non-linear effects and pairwise interactions without sacrificing interpretability, we found that psychological distress, elevated BMI and academic dissatisfaction were the strongest predictors of IBS (mean AUC = 0.852 across 100 stratified train-test splits). Critically, several findings diverged from the original logistic regression analysis. Physical activity showed a non-linear risk pattern only at high intensity, the association with gender was substantially weaker when we accounted for metabolic and psychological factors as well and malnourishment does not have a strong an impact as in the original study. These divergences likely arise because the machine-learning model captures non-linear effects and interactions that were not represented in the original regression specification. Our findings underscore the value of reanalysing existing datasets with methods suited to capturing complexity and highlight data quality verification as a necessary step in the secondary analysis. Author summaryWe reanalysed a dataset on Irritable Bowel Syndrome (IBS) among university students in Dhaka, Bangladesh. Before modelling, we audited the dataset, removed implausible records, and reconstructed the IBS classification from the Rome III questionnaire. We then applied an interpretable machine-learning model capable of modelling non-linear effects and interactions between variables. Psychological distress (particularly anxiety and stress), body mass index, and dissatisfaction with academic major showed the strongest associations with IBS. The model also identified several interaction effects involving BMI. Our results differ in several respects from the original regression analysis, suggesting that modelling assumptions and data validation can influence the interpretation of IBS correlates. This study shows how explainable machine-learning models can complement conventional statistical analyses and how data validation can affect results in secondary analyses.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.