Back

Faithful Supervised Dimensionality Reduction for Biomedical Data via Decision Geometry

Wang, Z.; Zhou, Z.; Zhan, Q.; Shen, L.

2026-05-26 bioinformatics
10.64898/2026.05.21.727018 bioRxiv
Show abstract

Unsupervised dimensionality reduction methods aim to preserve intrinsic data geometry by maintaining local neighborhoods and approximate global relationships in low-dimensional embeddings, but they do not use label information and therefore may fail to reflect task-relevant class structure in biomedical and health applications. Supervised dimensionality reduction (SDR) incorporates labels to improve class organization, yet existing approaches often face a trade-off between discrimination and geometric faithfulness. Linear supervised methods are stable and interpretable but are limited in their ability to capture nonlinear structure, whereas many nonlinear methods impose supervision directly in the embedding space, which can over-separate classes and distort the underlying manifold. In biomedical applications, labels such as cell types in single-cell data or patient status in clinical cohorts provide meaningful biological signal, and supervised dimensionality reduction can use this information to produce more informative low-dimensional representations. Here we propose a new framework, DG-UMAP (Decision-Geometry UMAP), for faithful supervised dimensionality reduction via decision geometry. We first fit a classifier in the original feature space and use its boundary-local decision geometry to construct a low-rank metric deformation that emphasizes discriminative directions while limiting geometric distortion. Parametric UMAP is then applied to the transformed space, so supervision acts through the ambient geometry rather than by directly forcing class separation in the embedding. Across synthetic and multiple real-world biomedical datasets, our method yields embeddings with improved agreement with class structure and global organization while preserving local neighborhood quality.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 13%
12.6%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 5%
10.5%
3
Bioinformatics
1061 papers in training set
Top 3%
8.5%
4
Advanced Science
249 papers in training set
Top 2%
6.9%
5
Science Advances
1098 papers in training set
Top 4%
4.0%
6
Cell Systems
167 papers in training set
Top 5%
3.1%
7
Scientific Reports
3102 papers in training set
Top 43%
2.8%
8
PLOS Computational Biology
1633 papers in training set
Top 12%
2.8%
50% of probability mass above
9
Nucleic Acids Research
1128 papers in training set
Top 7%
2.8%
10
Communications Biology
886 papers in training set
Top 3%
2.8%
11
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
2.8%
12
Nature Computational Science
50 papers in training set
Top 0.3%
2.6%
13
PLOS ONE
4510 papers in training set
Top 46%
2.4%
14
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.9%
15
Journal of Computational Biology
37 papers in training set
Top 0.1%
1.9%
16
Genome Biology
555 papers in training set
Top 4%
1.8%
17
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.7%
18
Nature Methods
336 papers in training set
Top 5%
1.5%
19
Nature Biotechnology
147 papers in training set
Top 5%
1.3%
20
Medical Image Analysis
33 papers in training set
Top 0.7%
1.3%
21
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
22
Patterns
70 papers in training set
Top 2%
1.0%
23
eLife
5422 papers in training set
Top 53%
0.9%
24
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.8%
25
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
26
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
27
Ecology Letters
121 papers in training set
Top 1%
0.8%
28
Physical Review Research
46 papers in training set
Top 0.8%
0.8%
29
IEEE Access
31 papers in training set
Top 1%
0.6%
30
JMIR Medical Informatics
17 papers in training set
Top 2%
0.6%