scTGCL: A Transformer-Based Graph Contrastive Learning Approach for Efficiently Clustering Single-Cell RNA-seq Data
Khan, M. S. A.; Kabir, M. H.; Faisal, M. M.
Show abstract
Single-cell RNA sequencing (scRNA-seq) enables characterization of cellular heterogeneity but clustering remains challenging due to high dimensionality, dropout induced sparsity, and technical noise. Existing graph-based and contrastive methods often rely on predefined similarity measures or suffer from high computational costs on large datasets. We propose single-cell Transformer-based Graph Contrastive Learning (scTGCL), a framework integrating multi-head self-attention with graph contrastive learning to learn robust cell representations. The model projects raw expression data into an embedding space and employs multi-head attention to adaptively learn weighted cell-cell graphs capturing diverse biological relationships. For contrastive augmentation, we apply random gene masking at the feature level and random edge dropping on attention matrices, simulating dropout and structural uncertainty. A symmetric contrastive loss maximizes agreement between original and augmented representations, while joint optimization with reconstruction and imputation losses preserves biological interpretability. Experiments on ten real scRNA-seq datasets demonstrate that scTGCL consistently outperforms nine state-of-the-art methods across clustering accuracy, normalized mutual information, and adjusted Rand index. Ablation studies validate each architectural component, and robustness analysis on simulated data confirms stable performance under varying dropout rates and differential expression levels. Furthermore, scTGCL exhibits superior computational efficiency, achieving substantially lower runtime on large scale datasets compared with existing approaches. The framework provides an accurate, efficient, and scalable solution for single-cell clustering. Source code and datasets are available at https://github.com/ShoaibAbdullahKhan/scTGCL.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.