Back

Atlas-Level Single-Cell and Spatial Transcriptomics Data Integration via PRIME

Wu, X.; Wang, X.; Wang, J.; Wan, S.

2026-05-23 bioinformatics
10.64898/2026.05.20.726698 bioRxiv
Show abstract

Single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) have enabled atlas-scale cellular cartography, with consortium efforts now assembling millions of cells across diverse tissues, donors, and technologies to build comprehensive references for cell identify and disease mechanism, yet the scientific value of these atlases hinges on robust computational integration across heterogeneous data sources. Unlike pairwise batch correction, atlas-level integration must jointly reconcile heterogeneous and often hierarchically nested batch effects across many datasets whose cell-type compositions are highly imbalanced, all while preserving subtle biological variation and remaining computationally tractable at the scale of millions of cells. Existing approaches often prioritize either batch mixing or preservation of local biological structure, and most cannot natively accommodate spatial coordinates. Here we introduce PRIME (Projection-based Robust Integration via Manifold Embedding), an ensemble integration framework that combines random-projection-based consensus anchoring, graph-Laplacian correction, and optional spatial-neighborhood regularization. Across multiple random projections of the expression manifold, PRIME uses consensus voting to keep only cell pairs that repeatedly matched, reducing false anchors caused by projection-specific distortions. For ST, PRIME couples this expression-based anchor graph with a coordinate-derived spatial neighborhood graph in a unified graph-Laplacian objective with closed-form solution, enabling simultaneous cross-batch alignment and local spatial coherence. Based on extensive benchmarking spanning diverse datasets, we show that PRIME consistently outperforms state-of-the-art methods in both batch correction and biological conservation across scRNA-seq and ST integration scenarios and downstream tasks including trajectory inference, spatial-domain preservation, and perturbation-response analysis. Particularly, when integrating a human hematopoiesis benchmark spanning eight donors and approximately 33,000 cells, PRIME preserves biologically coherent developmental trajectories in human hematopoiesis. It also maintains cortical laminar architecture across dorsolateral prefrontal cortex sections in a ST dataset and recovers known drug-target relationships in a perturbation atlas of more than 1 million cells while suppressing batch-associated confounders. Together, these results establish PRIME as a versatile and scalable framework for atlas-level integration of scRNA-seq and ST across diverse biological applications.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.1%
26.0%
2
Nature Biotechnology
147 papers in training set
Top 0.2%
18.7%
3
Nature Communications
4913 papers in training set
Top 29%
6.4%
50% of probability mass above
4
Cell Systems
167 papers in training set
Top 3%
4.3%
5
Genome Biology
555 papers in training set
Top 2%
4.0%
6
Advanced Science
249 papers in training set
Top 5%
3.6%
7
Nature Cell Biology
99 papers in training set
Top 2%
3.1%
8
Bioinformatics
1061 papers in training set
Top 6%
2.6%
9
Nature Genetics
240 papers in training set
Top 3%
2.4%
10
Science
429 papers in training set
Top 12%
2.1%
11
Genome Medicine
154 papers in training set
Top 4%
1.7%
12
Nature
575 papers in training set
Top 11%
1.7%
13
Nature Biomedical Engineering
42 papers in training set
Top 0.8%
1.7%
14
Cell Genomics
162 papers in training set
Top 3%
1.7%
15
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
16
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
17
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.2%
18
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.0%
19
Nature Computational Science
50 papers in training set
Top 1%
0.9%
20
Science Advances
1098 papers in training set
Top 26%
0.9%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
22
Nature Chemical Biology
104 papers in training set
Top 3%
0.9%
23
Cell Reports
1338 papers in training set
Top 33%
0.8%
24
Nature Medicine
117 papers in training set
Top 5%
0.7%
25
Nature Microbiology
133 papers in training set
Top 5%
0.7%
26
Genome Research
409 papers in training set
Top 5%
0.6%
27
Communications Biology
886 papers in training set
Top 32%
0.5%