Back

scRGCL: Neighbor-Aware Graph Contrastive Learning for Robust Single-Cell Clustering

Fan, J.; Liu, F.; Lai, X.

2026-03-18 bioinformatics
10.64898/2026.03.16.712039 bioRxiv
Show abstract

Accurate cell type identification is a fundamental step in single-cell RNA sequencing (scRNA-seq) data analysis, providing critical insights into cellular heterogeneity at high resolution. However, the high dimensionality, zero-inflated, and long-tailed distribution of scRNA-seq data pose significant computational challenges for conventional clustering approaches. Although recent deep learning-based methods utilize contrastive learning to joint-learn representations and clustering assignments, they often overlook cluster-level information, leading to suboptimal feature extraction for downstream tasks. To address these limitations, we propose scRGCL, a single-cell clustering method that learns a regularized representation guided by contrastive learning. Specifically, scRGCL captures the cell-type-associated expression structure by clustering similar cells together while ensuring consistency. For each sample, the model performs negative sampling by selecting cells from distinct clusters, thereby ensuring semantic dissimilarity between the target cell and its negative pairs. Moreover, scRGCL introduces a neighbor-aware re-weighting strategy that increases the contribution of samples from clusters closely related to the target. This mechanism prevents cells from the same category from being mistakenly pushed apart, effectively preserving intra-cluster compactness. Extensive experiments on fourteen public datasets demonstrate that scRGCL consistently outperforms state-of-the-art methods, as evidenced by significant improvements in normalized mutual information (NMI) and adjusted rand index (ARI). Moreover, ablation studies confirm that the integration of cluster-aware negative sampling and the neighbor-aware re-weighting module is essential for achieving high-fidelity clustering. By harmonizing cell-level contrast with cluster-level guidance, scRGCL provides a robust and scalable framework that advances the precision of automated cell-type discovery in increasingly complex single-cell landscapes. Key MessagesO_LIscRGCL uses contrastive learning on a regularized representation for single-cell clustering. C_LIO_LIscRGCL outperforms four state-of-the-art methods on 15 datasets. C_LIO_LIscRGCLs cluster-aware negative sampling and the neighbor-aware re-weighting modules are essential for high-fidelity single cell clustering. C_LI

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genome Research
409 papers in training set
Top 0.1%
17.1%
2
Bioinformatics
1061 papers in training set
Top 2%
14.4%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.4%
9.9%
4
Genome Biology
555 papers in training set
Top 0.9%
6.7%
5
Nature Communications
4913 papers in training set
Top 30%
6.2%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 3%
6.2%
7
Nature Methods
336 papers in training set
Top 3%
3.9%
8
Nature Biotechnology
147 papers in training set
Top 3%
3.0%
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.0%
10
Cell Reports Methods
141 papers in training set
Top 2%
1.8%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
15
Advanced Science
249 papers in training set
Top 11%
1.7%
16
Cell Systems
167 papers in training set
Top 8%
1.6%
17
Nature Computational Science
50 papers in training set
Top 0.8%
1.4%
18
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.3%
19
Genome Medicine
154 papers in training set
Top 6%
1.2%
20
iScience
1063 papers in training set
Top 28%
0.9%
21
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
22
Patterns
70 papers in training set
Top 2%
0.8%
23
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
24
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
25
PLOS ONE
4510 papers in training set
Top 70%
0.7%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.6%
27
Cell Genomics
162 papers in training set
Top 8%
0.6%