Back

Clustering Strategies Improve Structure-Preserving Visualization of Single-Cell RNA-seq Data with CBMAP

Alchaar, M.; Dogan, B.

2026-05-04 bioinformatics
10.64898/2026.04.30.721861 bioRxiv
Show abstract

Dimensionality reduction for visualization is a fundamental step in single-cell RNA sequencing (scRNA-seq) analysis due to the extremely high dimensionality of gene expression profiles. However, widely used nonlinear embedding techniques such as UMAP and t-SNE can introduce substantial distortions when projecting data into two-dimensional space, potentially altering global organization, local neighborhoods, and distance relationships in ways that may mislead downstream biological interpretation. In this study, we investigate the applicability of Clustering-Based Manifold Approximation and Projection (CBMAP) for the visualization of scRNA-seq data and systematically examine how clustering strategies influence the quality of the resulting embeddings. CBMAP was integrated with several clustering algorithms commonly used in single-cell analysis, including k-means, Leiden, HDBSCAN, Secuer, HGC, and FlowSOM. The resulting embeddings were evaluated using quantitative metrics that measure global, local, and distance-level structure preservation and were compared with widely used dimensionality reduction methods such as UMAP, t-SNE, and PaCMAP across multiple benchmark datasets. Our results demonstrate that the clustering stage plays a critical role in determining the structural fidelity of CBMAP embeddings. Clustering algorithms specifically designed for single-cell transcriptomic data, particularly Secuer, produced more consistent preservation of global relationships between cell populations. Across multiple datasets, CBMAP more faithfully preserved global structural organization and inter-population distance relationships than the compared methods, although local neighborhood preservation was generally weaker than in techniques optimized for local structure. Importantly, CBMAP embeddings retained biologically meaningful relationships in trajectory benchmark datasets. When combined with RNA velocity analysis, CBMAP successfully preserved cyclic progenitor states and branching differentiation trajectories, demonstrating compatibility with trajectory-aware visualization. These findings indicate that CBMAP provides a structure-faithful visualization framework for scRNA-seq data and that clustering selection plays a central role in determining embedding quality.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
12.4%
2
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
10.1%
3
Bioinformatics
1061 papers in training set
Top 3%
7.2%
4
BMC Bioinformatics
383 papers in training set
Top 1%
6.8%
5
PLOS ONE
4510 papers in training set
Top 28%
6.3%
6
Bioinformatics Advances
184 papers in training set
Top 0.7%
4.9%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
50% of probability mass above
8
Scientific Reports
3102 papers in training set
Top 36%
3.6%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.1%
10
iScience
1063 papers in training set
Top 9%
2.4%
11
GigaScience
172 papers in training set
Top 1%
1.7%
12
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
13
BMC Genomics
328 papers in training set
Top 3%
1.7%
14
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.5%
15
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.3%
16
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 5%
1.3%
17
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.3%
18
Advanced Science
249 papers in training set
Top 14%
1.2%
19
Frontiers in Genetics
197 papers in training set
Top 7%
1.1%
20
Life Science Alliance
263 papers in training set
Top 1%
0.9%
21
Genome Biology
555 papers in training set
Top 6%
0.9%
22
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
23
Analytical Chemistry
205 papers in training set
Top 2%
0.8%
24
Frontiers in Plant Science
240 papers in training set
Top 5%
0.8%
25
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.7%
26
Physical Biology
43 papers in training set
Top 2%
0.7%
27
Patterns
70 papers in training set
Top 2%
0.7%
28
Communications Biology
886 papers in training set
Top 26%
0.7%
29
Cell Reports Methods
141 papers in training set
Top 6%
0.6%
30
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%