Back

scDisent: disentangled representation learning with causal structure for multi-omic single-cell analysis

Xi, G.

2026-04-16 bioinformatics
10.64898/2026.04.12.717909 bioRxiv
Show abstract

Single-cell multi-omic technologies measure complementary aspects of cellular identity and regulatory state, yet most integration models compress these signals into one entangled latent space. Such representations are useful for clustering but poorly suited for mechanistic interpretation or perturbation-oriented analysis. We present scDisent (https://github.com/xiguoren/scDisent), a generative framework for disentangled representation learning that separates expression-associated variables (zexpr) from regulation-associated variables (zreg) and links them through a sparse directed mapping. scDisent combines modality-specific encoding, variational disentanglement with total-correlation and orthogonality constraints, and a Gumbelgated causal module protected by detach-based gradient isolation. Evaluated on benchmark datasets with matched modalities, scDisent achieved best-in-benchmark integration performance while exposing regulatory structure that competing integration methods do not model explicitly. The learned causal atlas remained sparse, perturbation analyses recovered biologically coherent lineage-associated programs, and cross-dataset discovery analyses highlighted interpretable immune, neural and developmental signatures. Quantitative branch-separation analyses further showed that benchmark-label information concentrated in zexpr rather than zreg. Together, these results position scDisent as a computational method that improves not only integration quality but also biological interpretability, making single-cell multi-omic representations better suited to biological question answering and in silico hypothesis generation.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.5%
17.2%
2
Genome Biology
555 papers in training set
Top 0.2%
12.4%
3
Nature Biotechnology
147 papers in training set
Top 0.5%
12.1%
4
Nature Communications
4913 papers in training set
Top 19%
9.9%
50% of probability mass above
5
Cell Systems
167 papers in training set
Top 2%
6.7%
6
Nucleic Acids Research
1128 papers in training set
Top 5%
3.9%
7
Advanced Science
249 papers in training set
Top 6%
3.5%
8
Nature
575 papers in training set
Top 7%
3.5%
9
Genome Medicine
154 papers in training set
Top 3%
2.5%
10
Bioinformatics
1061 papers in training set
Top 7%
2.0%
11
Nature Machine Intelligence
61 papers in training set
Top 2%
1.8%
12
Nature Genetics
240 papers in training set
Top 5%
1.6%
13
Nature Cell Biology
99 papers in training set
Top 3%
1.3%
14
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
15
Cell Genomics
162 papers in training set
Top 5%
1.2%
16
Science
429 papers in training set
Top 17%
1.2%
17
Genome Research
409 papers in training set
Top 3%
1.2%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
19
Patterns
70 papers in training set
Top 2%
0.9%
20
Nature Microbiology
133 papers in training set
Top 4%
0.8%
21
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
22
Cell
370 papers in training set
Top 16%
0.8%
23
Nature Computational Science
50 papers in training set
Top 2%
0.7%
24
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
25
Communications Biology
886 papers in training set
Top 30%
0.6%