Back

A lightweight, ultrafast and general embedding framework for large-scale spatial omics data

Dai, B.; Liang, Y.; Yi, L.; Hu, P.; Song, Y.; Qian, B.; He, M.; Wang, L.; Yuan, Z.; Zuo, Y.

2026-02-06 bioinformatics
10.64898/2026.02.04.703814 bioRxiv
Show abstract

Deciphering ultra-large-scale omics data with minimal resources while maintaining high computational efficiency is a longstanding challenge in biology. Here, we present Local Pooling (LP), a lightweight, ultrafast and general framework that leverage neighbor-indexing strategy and local pooling module to generate omics embedding and compatible with variety of downstream analyses. We developed its adaptation for spatial omics, SpaLP, and evaluated it on over 20 large-scale datasets spanning 9 technology platforms. SpaLP consistently outperformed baseline methods across multiple benchmarks, including niche identification, expression reconstruction, multiple slices integration, 3D organ construction, multi-omics integration and cross-platform generalization. Notably, SpaLP processed a 1.35-million-cell mouse embryo slice in just 47 seconds, achieving up to 300-fold increase in computational efficiency compared to Graph Neural Network (GNN)-based methods. Meanwhile, SpaLP increased the average adjusted rand index (ARI) by over 30% for niche identification in simulated and realistic settings. Furthermore, we applied SpaLP to integrate 8.4 million mouse brain cells within 4 minutes on a single GPU and constructed a 3D spatial atlas. Finally, we explored SpaLPs ability of cross-platform generalization and potential for developing an omics foundation model. As a novel and general framework, we believe that LP could help more researchers develop new model on large-scale data and overcome the research barriers caused by computing resources in more fields.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.2%
18.2%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.4%
10.2%
3
Nature Communications
4913 papers in training set
Top 23%
8.2%
4
Nucleic Acids Research
1128 papers in training set
Top 3%
6.2%
5
Bioinformatics
1061 papers in training set
Top 5%
4.2%
6
Genome Biology
555 papers in training set
Top 2%
4.2%
50% of probability mass above
7
Advanced Science
249 papers in training set
Top 6%
3.5%
8
Nature Machine Intelligence
61 papers in training set
Top 1%
3.5%
9
Molecular Plant
36 papers in training set
Top 0.4%
3.5%
10
Genome Medicine
154 papers in training set
Top 3%
2.3%
11
Cell Systems
167 papers in training set
Top 6%
2.0%
12
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
13
Nature Methods
336 papers in training set
Top 4%
2.0%
14
Genome Research
409 papers in training set
Top 2%
2.0%
15
Cell Genomics
162 papers in training set
Top 3%
1.8%
16
Journal of Genetics and Genomics
36 papers in training set
Top 0.9%
1.7%
17
Science China Life Sciences
26 papers in training set
Top 1%
1.7%
18
Patterns
70 papers in training set
Top 1%
1.3%
19
Nature Computational Science
50 papers in training set
Top 0.9%
1.3%
20
BMC Bioinformatics
383 papers in training set
Top 6%
1.2%
21
Communications Biology
886 papers in training set
Top 20%
0.9%
22
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
23
iScience
1063 papers in training set
Top 34%
0.7%
24
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
25
Protein & Cell
25 papers in training set
Top 3%
0.7%
26
Cell Reports Methods
141 papers in training set
Top 6%
0.7%
27
Cell Discovery
54 papers in training set
Top 5%
0.7%
28
Plant Communications
35 papers in training set
Top 2%
0.7%
29
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
30
Cell Research
49 papers in training set
Top 3%
0.6%