A lightweight, ultrafast and general embedding framework for large-scale spatial omics data
Dai, B.; Liang, Y.; Yi, L.; Hu, P.; Song, Y.; Qian, B.; He, M.; Wang, L.; Yuan, Z.; Zuo, Y.
Show abstract
Deciphering ultra-large-scale omics data with minimal resources while maintaining high computational efficiency is a longstanding challenge in biology. Here, we present Local Pooling (LP), a lightweight, ultrafast and general framework that leverage neighbor-indexing strategy and local pooling module to generate omics embedding and compatible with variety of downstream analyses. We developed its adaptation for spatial omics, SpaLP, and evaluated it on over 20 large-scale datasets spanning 9 technology platforms. SpaLP consistently outperformed baseline methods across multiple benchmarks, including niche identification, expression reconstruction, multiple slices integration, 3D organ construction, multi-omics integration and cross-platform generalization. Notably, SpaLP processed a 1.35-million-cell mouse embryo slice in just 47 seconds, achieving up to 300-fold increase in computational efficiency compared to Graph Neural Network (GNN)-based methods. Meanwhile, SpaLP increased the average adjusted rand index (ARI) by over 30% for niche identification in simulated and realistic settings. Furthermore, we applied SpaLP to integrate 8.4 million mouse brain cells within 4 minutes on a single GPU and constructed a 3D spatial atlas. Finally, we explored SpaLPs ability of cross-platform generalization and potential for developing an omics foundation model. As a novel and general framework, we believe that LP could help more researchers develop new model on large-scale data and overcome the research barriers caused by computing resources in more fields.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.