Back

DeSpotX: Identifiability-Based Decontamination for Spatial Transcriptomics

Wang, R. H.; Gentles, A. J.

2026-05-14 bioinformatics
10.64898/2026.05.12.724704 bioRxiv
Show abstract

Spatial transcriptomics (ST) at single-cell resolution profiles gene expression in its native spatial context, but a substantial fraction of transcripts contaminate neighboring cells, compromising downstream biological analyses. Existing decontamination methods rely on heuristic priors and either ignore the spatial structure of contamination or aggregate over neighbors without separating contamination from native expression, leaving the decomposition ambiguous. To resolve this ambiguity, we introduce DeSpotX, a deep generative model that uses anchor genes, defined as genes not natively expressed in a given cell cluster, to constrain the contamination decomposition and make it identifiable. DeSpotX further uses spatial information to estimate contamination locally through a cluster-masked, distance-weighted average over neighboring cells, and prevents over-correction of low-expression signal through a learned diffusion prior. On spike-in simulations across five datasets and four ST platforms, DeSpotX achieves AUROC > 0.94 on every dataset, with gains of 0.02 to 0.12 over the best baseline, and remains robust to inaccuracies in the cell-cluster annotation and in anchor gene construction. On real tissues, we show that the decontaminated counts produce improved marker-gene specificity, more spatially coherent expression, and cell-cell communication networks consistent with known biology. We further show that iterating decontamination and cell-cluster annotation refines these outcomes, reassigning ligand-receptor signaling to the expected source cells in mouse brain and breast cancer tissues.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.2%
18.2%
2
Nature Methods
336 papers in training set
Top 0.4%
18.2%
3
Cell Systems
167 papers in training set
Top 1%
9.9%
4
Nature Communications
4913 papers in training set
Top 24%
8.2%
50% of probability mass above
5
Genome Biology
555 papers in training set
Top 1.0%
6.7%
6
Bioinformatics
1061 papers in training set
Top 5%
3.9%
7
Nature
575 papers in training set
Top 7%
3.5%
8
Science
429 papers in training set
Top 9%
3.5%
9
Nature Genetics
240 papers in training set
Top 4%
2.0%
10
Nature Machine Intelligence
61 papers in training set
Top 2%
1.8%
11
Genome Research
409 papers in training set
Top 2%
1.7%
12
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
13
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.3%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.1%
15
Genome Medicine
154 papers in training set
Top 7%
0.9%
16
Advanced Science
249 papers in training set
Top 17%
0.9%
17
Nucleic Acids Research
1128 papers in training set
Top 16%
0.9%
18
PLOS ONE
4510 papers in training set
Top 64%
0.9%
19
Nature Computational Science
50 papers in training set
Top 2%
0.8%
20
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
21
Cell
370 papers in training set
Top 17%
0.7%
22
Cell Genomics
162 papers in training set
Top 7%
0.7%
23
Nature Microbiology
133 papers in training set
Top 5%
0.7%
24
Science Advances
1098 papers in training set
Top 34%
0.6%
25
Nature Cell Biology
99 papers in training set
Top 5%
0.6%
26
Nature Chemical Biology
104 papers in training set
Top 4%
0.6%
27
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%
28
Scientific Reports
3102 papers in training set
Top 78%
0.6%