Back

Interpretable Predictive Modeling for Medical Data Using Boolean Rule-aware Regression

Eskandarian, M.; Malekpour, S. A.

2026-05-18 bioinformatics
10.64898/2026.05.14.725084 bioRxiv
Show abstract

PurposeIn clinical practice, accurate prediction of disease risk must be accompanied by transparent, human-understandable explanations to support diagnostic confidence, guide therapeutic decisions, and meet ethical and regulatory standards. While deep neural networks achieve high predictive performance in tasks such as cancer detection and diabetes risk stratification, their black-box nature prevents clinicians from understanding the reasoning behind predictions, severely limiting trust and safe integration into patient care. MethodsWe present Regression-Based Boolean Rule (RBBR), a framework that automatically derives clinically interpretable Boolean rules directly from patient data. RBBR generates human-readable conjunctions (logical AND combinations) of up to three clinical features, transforms them into inputs for ridge regression to predict binary or multi-class disease outcomes, estimates rule importance via regularized coefficients, and selects the most parsimonious and predictive rule sets using the Bayesian Information Criterion. ResultsApplied to six real-world medical datasets (lung cancer screening and staging, Wisconsin and diagnostic breast cancer, heart failure, and early-stage diabetes risk), RBBR consistently produced concise, clinically meaningful rules - e.g., gender-specific symptom combinations in diabetes, distinct histopathological subpopulations in breast cancer, and symptom-risk factor interactions in lung cancer - with strong explanatory power (R2 up to 0.92) and competitive discrimination. ConclusionBy delivering logical, transparent decision rules aligned with clinical reasoning (if symptom A and B, then high risk), RBBR bridges the gap between predictive accuracy and bedside usability, enabling clinicians to validate predictions, identify high-risk patients, stratify subpopulations, and enhance shared decision-making in routine care.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
18.5%
2
Scientific Reports
3102 papers in training set
Top 10%
8.4%
3
Nature Communications
4913 papers in training set
Top 27%
6.8%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
6.3%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.9%
6
Nature Machine Intelligence
61 papers in training set
Top 1.0%
3.6%
7
PLOS ONE
4510 papers in training set
Top 39%
3.6%
50% of probability mass above
8
Bioinformatics
1061 papers in training set
Top 5%
3.6%
9
JMIR Medical Informatics
17 papers in training set
Top 0.4%
2.7%
10
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.6%
11
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.1%
12
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.7%
13
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.7%
14
Communications Medicine
85 papers in training set
Top 0.3%
1.5%
15
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.2%
1.5%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
17
Communications Biology
886 papers in training set
Top 13%
1.3%
18
Journal of Biomedical Informatics
45 papers in training set
Top 1.0%
1.3%
19
Patterns
70 papers in training set
Top 1%
1.3%
20
Genome Medicine
154 papers in training set
Top 6%
1.2%
21
Advanced Science
249 papers in training set
Top 14%
1.2%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
23
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.9%
24
Nature Medicine
117 papers in training set
Top 4%
0.9%
25
BioData Mining
15 papers in training set
Top 0.8%
0.8%
26
GigaScience
172 papers in training set
Top 3%
0.8%
27
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
28
Science Advances
1098 papers in training set
Top 28%
0.8%
29
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
30
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%