Back

Cell Type Weighted Dimensionality Reduction

Putta, S.; Jensen, W.; Devakonda, S.; Pennell, L.; Croteau, J.

2026-05-05 bioinformatics
10.64898/2026.04.30.721796 bioRxiv
Show abstract

High-dimensional single-cell technologies, such as flow cytometry and CITE-Seq, typically rely on established lineage markers to define cell identities. Additional markers are commonly analyzed within the context of these predefined cell types. Nonlinear projection methods such as t-SNE and UMAP provide a visual framework for this analysis by enabling the overlay of cell types and marker expression. However, these methods frequently produce projections where distinct cell types substantially overlap, hindering interpretation of marker expression patterns relative to known cell types. In this study, we investigate the underlying causes of this phenomenon and demonstrate that such overlaps often stem from the inherent high-dimensional structure of the data rather than limitations in the dimensionality reduction algorithms themselves. To address this, we introduce Cell Type Weighted Dimensionality Reduction (CWDR), a novel approach that incorporates lineage-based information through a supervised weighting mechanism. By integrating both cell identity and marker expression, CWDR preserves the visual separation between predefined cell types while maintaining the local variance necessary for downstream analysis. We validate our method across multiple high-dimensional flow cytometry and proteogenomic datasets. Our results show that CWDR significantly reduces inter-cluster overlap compared to traditional methods, providing a clearer framework for visualizing marker expression within the context of specific cell lineages.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 14%
14.1%
2
Bioinformatics
1061 papers in training set
Top 2%
14.1%
3
PLOS Computational Biology
1633 papers in training set
Top 3%
9.9%
4
BMC Bioinformatics
383 papers in training set
Top 1%
7.0%
5
Scientific Reports
3102 papers in training set
Top 20%
6.2%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 6%
3.5%
7
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.8%
3.5%
8
Cytometry Part A
30 papers in training set
Top 0.1%
3.5%
9
Genome Biology
555 papers in training set
Top 4%
2.0%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.0%
11
iScience
1063 papers in training set
Top 12%
1.9%
12
Nature Communications
4913 papers in training set
Top 54%
1.5%
13
Communications Biology
886 papers in training set
Top 11%
1.5%
14
Journal of Cell Science
353 papers in training set
Top 1%
1.5%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.3%
16
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.2%
17
Molecular & Cellular Proteomics
158 papers in training set
Top 1%
0.9%
18
Cell Systems
167 papers in training set
Top 11%
0.9%
19
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
20
Journal of Computational Biology
37 papers in training set
Top 0.4%
0.9%
21
Life Science Alliance
263 papers in training set
Top 2%
0.7%
22
Advanced Science
249 papers in training set
Top 20%
0.7%
23
Patterns
70 papers in training set
Top 3%
0.6%