Back

RareCapsNet: An explainable capsule networks enable robust discovery of rare cell populations from large-scale single-cell transcriptomics

Ray, S.; Lall, S.

2026-02-04 bioinformatics
10.64898/2026.02.02.703229 bioRxiv
Show abstract

In-silico analysis of single cell data (downstream analysis) seeks considerable attention to the machine learning researchers in the last few years. Recent technological advances and increases in throughput capabilities open up great new chances to discover rare cell types. We develop RareCapsNet, a rare cell identification technique through capsule network in large single cell RNA-seq data. RareCapsNet aiming to leverage the landmark advantages of capsule networks in single cell domain, by identifying novel rare cell population through markers genes explained from human-mind-friendly interpretation of lower-level (primary) capsules. We demonstrate the explainability of capsule network for identifying novel markers that are act as signature of certain cell population of rare type. A comprehensive evaluation in simulated and real life single cell data demonstrate the efficacy of RareCapsNet for finding out rare population in large scRNA-seq data. RareCapsNet outperforms the other state-of-the-art not only in specificity and selectivity for identifying rare cell types, it can also successfully extract transcriptomic signature of the cell population. We demonstrate RareCapsNet to the dataset of multiple batch, where the model can store the knowledge of one batch which can be transferred to find out rare cells of other batch without training the model. Availability and ImplementationRareCapsNet is available at: https://github.com/sumantaray/RareCapsNet.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.8%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.3%
10.6%
3
BMC Bioinformatics
383 papers in training set
Top 2%
6.5%
4
Bioinformatics Advances
184 papers in training set
Top 0.4%
6.4%
5
Nature Machine Intelligence
61 papers in training set
Top 0.5%
4.9%
50% of probability mass above
6
Nature Communications
4913 papers in training set
Top 37%
4.0%
7
Frontiers in Genetics
197 papers in training set
Top 2%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
9
Genome Biology
555 papers in training set
Top 2%
3.6%
10
iScience
1063 papers in training set
Top 8%
2.6%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.1%
12
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
13
Genome Research
409 papers in training set
Top 2%
1.9%
14
Patterns
70 papers in training set
Top 0.7%
1.8%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.7%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
17
Genome Medicine
154 papers in training set
Top 6%
1.2%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
19
Scientific Reports
3102 papers in training set
Top 68%
1.1%
20
BMC Genomics
328 papers in training set
Top 4%
1.0%
21
PLOS ONE
4510 papers in training set
Top 62%
1.0%
22
Advanced Science
249 papers in training set
Top 16%
0.9%
23
Nature Methods
336 papers in training set
Top 6%
0.8%
24
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
25
Communications Biology
886 papers in training set
Top 23%
0.8%
26
Cell Systems
167 papers in training set
Top 12%
0.7%
27
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.8%
0.7%
28
Quantitative Biology
11 papers in training set
Top 0.9%
0.7%
29
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.5%
30
Nature Computational Science
50 papers in training set
Top 2%
0.5%