Back

EBEx: an Ensemble-Based Explainable Framework for Gene Calling in Heterogeneous Diseases

Pose-Lagoa, I.; Urda-Garcia, B.; Olvera, N.; Sanchez-Valle, J.; Faner, R.; Valencia, A.; Carbonell-Caballero, J.

2026-03-14 bioinformatics
10.64898/2026.03.12.710464 bioRxiv
Show abstract

Complex and clinically heterogeneous diseases pose significant challenges for gene prioritisation and patient stratification, as relevant genes often show weak or context-specific signals and transcriptomic datasets are limited in size. These limitations hinder the discovery of robust molecular signatures using traditional case-control approaches and motivate computational pipelines capable of capturing molecular diversity. Here, we present an explainable ensemble-based AI pipeline to prioritise disease-relevant genes from transcriptomic data, using Chronic Obstructive Pulmonary Disease (COPD) as a use case. To retain biologically relevant interactors obscured by molecular heterogeneity, the framework integrates data-driven signals with curated COPD-related gene sets, further expanded through network-based prioritisation and supported by molecular interactions. Gene relevance is evaluated via aggregated explainability scores across multiple classifier configurations to ensure robust candidate selection. The final set comprised < 8% of evaluated genes, [~] 62% arising from network-based expansion, substantially reducing dimensionality while preserving biological heterogeneity. Beyond case-control classification, the approach identified candidate genes and molecular subgroups associated with specific clinical features, capturing patient-level heterogeneity. The prioritised genes recapitulated key disease-related processes, including immune responses and extracellular matrix degradation, and highlighted additional associations like the enrichment of the IL-4 and IL-13 signalling pathway, which is of clinical interest given ongoing biologic developments targeting these axes. Our pipeline outperformed existing methods in discriminating COPD from controls, and the final gene list was validated in independent cohorts. Implemented as a scalable and reusable R package, this framework facilitates the study of molecular heterogeneity in complex diseases like COPD, supporting advances in diagnosis and precision medicine. Availability and implementationEBEx code and tutorials can be found in: https://iposelag.github.io/EBEx/

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.1%
2
Genome Medicine
154 papers in training set
Top 0.3%
14.1%
3
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
9.9%
4
PLOS Computational Biology
1633 papers in training set
Top 7%
4.8%
50% of probability mass above
5
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
6
Nature Communications
4913 papers in training set
Top 41%
3.5%
7
Patterns
70 papers in training set
Top 0.3%
3.5%
8
BMC Bioinformatics
383 papers in training set
Top 3%
2.7%
9
Scientific Reports
3102 papers in training set
Top 47%
2.4%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.0%
11
GigaScience
172 papers in training set
Top 1%
2.0%
12
iScience
1063 papers in training set
Top 12%
1.9%
13
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.8%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
16
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
17
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.5%
18
International Journal of Molecular Sciences
453 papers in training set
Top 11%
1.2%
19
Communications Biology
886 papers in training set
Top 15%
1.2%
20
Advanced Science
249 papers in training set
Top 15%
1.1%
21
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
22
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
23
BMC Medical Genomics
36 papers in training set
Top 1%
0.9%
24
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.7%
25
PLOS ONE
4510 papers in training set
Top 69%
0.7%
26
Journal of Translational Medicine
46 papers in training set
Top 4%
0.6%