Back

Explainable machine learning for revisiting reported Irritable Bowel Syndrome correlates in a student cohort

Ramirez-Lopez, L.; Kang, P.

2026-04-15 gastroenterology
10.64898/2026.04.13.26350820 medRxiv
Show abstract

Irritable Bowel Syndrome (IBS) affects a substantial proportion of university students, yet its factors remain incompletely characterised in South Asian populations. We reanalysed a publicly available dataset of 550 Bangladeshi students from Hasan et al. [1], conducting a data audit that identified implausible records, including males reporting menstrual symptoms, and reduced the analytic sample to 506 observations. Using Explainable Boosting Machines (EBMs), which capture non-linear effects and pairwise interactions without sacrificing interpretability, we found that psychological distress, elevated BMI and academic dissatisfaction were the strongest predictors of IBS (mean AUC = 0.852 across 100 stratified train-test splits). Critically, several findings diverged from the original logistic regression analysis. Physical activity showed a non-linear risk pattern only at high intensity, the association with gender was substantially weaker when we accounted for metabolic and psychological factors as well and malnourishment does not have a strong an impact as in the original study. These divergences likely arise because the machine-learning model captures non-linear effects and interactions that were not represented in the original regression specification. Our findings underscore the value of reanalysing existing datasets with methods suited to capturing complexity and highlight data quality verification as a necessary step in the secondary analysis. Author summaryWe reanalysed a dataset on Irritable Bowel Syndrome (IBS) among university students in Dhaka, Bangladesh. Before modelling, we audited the dataset, removed implausible records, and reconstructed the IBS classification from the Rome III questionnaire. We then applied an interpretable machine-learning model capable of modelling non-linear effects and interactions between variables. Psychological distress (particularly anxiety and stress), body mass index, and dissatisfaction with academic major showed the strongest associations with IBS. The model also identified several interaction effects involving BMI. Our results differ in several respects from the original regression analysis, suggesting that modelling assumptions and data validation can influence the interpretation of IBS correlates. This study shows how explainable machine-learning models can complement conventional statistical analyses and how data validation can affect results in secondary analyses.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
42.7%
2
Scientific Reports
3102 papers in training set
Top 4%
11.2%
50% of probability mass above
3
PLOS ONE
4510 papers in training set
Top 23%
7.7%
4
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.1%
3.9%
5
Frontiers in Physiology
93 papers in training set
Top 1%
3.5%
6
BMC Medicine
163 papers in training set
Top 2%
3.1%
7
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.0%
8
PeerJ
261 papers in training set
Top 6%
1.8%
9
BioData Mining
15 papers in training set
Top 0.4%
1.4%
10
Physiological Measurement
12 papers in training set
Top 0.3%
1.3%
11
Frontiers in Psychology
49 papers in training set
Top 0.9%
1.0%
12
Heliyon
146 papers in training set
Top 4%
1.0%
13
BMC Bioinformatics
383 papers in training set
Top 6%
1.0%
14
F1000Research
79 papers in training set
Top 4%
0.8%
15
Cureus
67 papers in training set
Top 4%
0.8%
16
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.8%
17
Statistics in Medicine
34 papers in training set
Top 0.3%
0.8%
18
BMC Medical Research Methodology
43 papers in training set
Top 2%
0.7%
19
Cancers
200 papers in training set
Top 5%
0.7%
20
Trials
25 papers in training set
Top 2%
0.5%
21
Frontiers in Psychiatry
83 papers in training set
Top 4%
0.5%
22
Computational Biology and Chemistry
23 papers in training set
Top 0.7%
0.5%
23
Life
27 papers in training set
Top 0.7%
0.5%