Back

Novel Parameter-Free and Interpretable Integration of CITE-seq RNA and ADT Profiles via Tensor Decomposition-Based Unsupervised Feature Extraction

Taguchi, Y.-h.; Turki, T.

2026-04-21 bioinformatics
10.64898/2026.04.18.719420 bioRxiv
Show abstract

CITE-seq jointly profiles cellular transcripts and surface proteins, but integrating RNA and antibody-derived tags (ADTs) remains challenging because the two modalities differ markedly in dimensionality, sparsity, and noise characteristics. We present a tensordecomposition-based unsupervised feature extraction framework for the parameter-free integration of CITE-seq data. By constructing a gene x cell x protein tensor and applying HOSVD, this method derives the shared latent representations of genes, cells, and proteins without prior gene filtering or modality-weight tuning. Across five ImmGen T-cell CITE-seq datasets, the resulting cell embeddings were generally more consistent with annotated cell types than RNA-only, protein-only, or totalVI-based embeddings, whereas the organ-level consistency did not improve. The latent factors also enabled post hoc unsupervised gene selection, and the selected genes showed biologically meaningful enrichment for T-cell-related terms. In addition, failure in a poor-quality dataset served as a useful quality-control signal. Together with a blocked sparse-matrix implementation for large tensors, these results indicate that tensor decomposition-based unsupervised feature extraction provides an interpretable, scalable, and competitive approach for integrating RNA and ADT measurements in CITE-seq experiments.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 12%
14.0%
2
Nature Biotechnology
147 papers in training set
Top 0.5%
12.1%
3
Genome Biology
555 papers in training set
Top 0.7%
8.2%
4
Nature Methods
336 papers in training set
Top 1%
8.2%
5
Advanced Science
249 papers in training set
Top 3%
6.2%
6
Bioinformatics
1061 papers in training set
Top 5%
4.7%
50% of probability mass above
7
Cell Systems
167 papers in training set
Top 3%
4.7%
8
Nucleic Acids Research
1128 papers in training set
Top 5%
3.9%
9
PLOS Computational Biology
1633 papers in training set
Top 12%
2.8%
10
Communications Biology
886 papers in training set
Top 6%
2.0%
11
Genome Research
409 papers in training set
Top 2%
2.0%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.8%
13
Cell Reports Methods
141 papers in training set
Top 2%
1.8%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
15
Genome Medicine
154 papers in training set
Top 5%
1.7%
16
Patterns
70 papers in training set
Top 1.0%
1.7%
17
Cell Genomics
162 papers in training set
Top 4%
1.3%
18
Nano Letters
63 papers in training set
Top 2%
1.2%
19
iScience
1063 papers in training set
Top 27%
0.9%
20
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
21
Cell Reports
1338 papers in training set
Top 34%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
23
Scientific Reports
3102 papers in training set
Top 77%
0.7%
24
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.6%
25
GigaScience
172 papers in training set
Top 4%
0.6%
26
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.6%
27
Science Advances
1098 papers in training set
Top 34%
0.6%
28
eLife
5422 papers in training set
Top 62%
0.6%
29
Cell
370 papers in training set
Top 19%
0.6%
30
Frontiers in Genetics
197 papers in training set
Top 12%
0.6%