Back

SRSA-VAE: Self-Attention-Based Feature Learning for Single-Cell Multimodal Clustering

Das, R.; Dey, A.; Maulik, U.; Bandyopadhyay, S.

2026-05-11 bioinformatics
10.64898/2026.05.06.723212 bioRxiv
Show abstract

Clustering plays a critical role in the analysis of single-cell omics data for identifying cellular heterogeneity and uncovering biological mechanisms. However, the high dimensionality, sparsity, and multimodal nature of single-cell datasets such as single-cell RNA sequencing (scRNA-seq) and Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) pose significant challenges for effective feature learning and representation learning. Traditional dimensionality reduction methods often rely on linear transformations and fail to capture complex nonlinear relationships between gene and protein expression profiles. In this work, we propose SRSA-VAE, a scalable variational autoencoder framework that integrates a residual self-attention encoder for context-aware feature learning and multimodal representation learning. The proposed model dynamically contextualizes gene and protein representations through a self-attention mechanism, enabling the encoder to capture inter-cell relationships and emphasize biologically informative signals. A scalable residual connection further stabilizes training and preserves essential input information during latent representation learning. We evaluate SRSA-VAE on five large-scale publicly available single-cell datasets, including both scRNA-seq and CITE-seq data, and compare its performance with established deep generative models. Experimental results demonstrate that SRSA-VAE consistently outperforms existing methods in Adjusted Rand Index (ARI) across benchmark datasets, with particularly strong gains on complex immune cell populations. Ablation studies further confirm the importance of the self-attention mechanism and residual connection in enhancing model stability and clustering accuracy. The proposed model offers a generalizable, robust, and scalable solution for single-cell clustering tasks. Code Repositoryhttps://github.com/rangan2510/srsa-vae

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 18%
10.1%
2
Bioinformatics
1061 papers in training set
Top 3%
9.1%
3
Nucleic Acids Research
1128 papers in training set
Top 2%
7.2%
4
Advanced Science
249 papers in training set
Top 2%
6.8%
5
Genome Research
409 papers in training set
Top 0.4%
6.4%
6
Briefings in Bioinformatics
326 papers in training set
Top 0.9%
6.3%
7
Genome Biology
555 papers in training set
Top 1%
4.8%
50% of probability mass above
8
Nature Methods
336 papers in training set
Top 2%
4.8%
9
Nature Machine Intelligence
61 papers in training set
Top 1.0%
3.6%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.6%
11
Patterns
70 papers in training set
Top 0.5%
2.4%
12
Communications Biology
886 papers in training set
Top 5%
2.1%
13
Cell Systems
167 papers in training set
Top 6%
2.1%
14
Nature Biotechnology
147 papers in training set
Top 4%
2.1%
15
Nature Computational Science
50 papers in training set
Top 0.5%
1.9%
16
Genome Medicine
154 papers in training set
Top 4%
1.7%
17
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
18
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
19
Cell Reports Methods
141 papers in training set
Top 3%
1.5%
20
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
21
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.2%
22
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
1.1%
23
Cell Genomics
162 papers in training set
Top 6%
0.9%
24
Science Advances
1098 papers in training set
Top 28%
0.8%
25
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
26
iScience
1063 papers in training set
Top 34%
0.7%
27
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
28
PLOS ONE
4510 papers in training set
Top 71%
0.6%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%