Back

SC-MAMBA2: Leveraging State-Space Models for Efficient Single-Cell Ultra-Long Transcriptome Modeling

Zhao, Y.; Zhao, B.; Zhang, F.; He, C.; Wu, W.; Lai, L.

2024-10-26 cell biology
10.1101/2024.09.30.615775 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThe rapid advancement of single-cell sequencing technology has significantly deepened our understanding of cellular heterogeneity, yet it concurrently presents substantial challenges for the unified modeling of single-cell data. Simultaneously, pre-trained foundation models have achieved notable success in domains such as natural language processing and image analysis. However, extending these models to accommodate ultra-long single-cell transcriptome sequences, characterized by an extensive number of genes, remains a formidable task. In this study, we introduce SC-MAMBA2, based on the MAMBA2 architecture, meticulously designed with a bidirectional modeling approach tailored for single-cell transcriptomics data. As the first single-cell foundation model to integrate state-space models (SSMs) underlying MAMBA2 architecture, SC-MAMBA2 features over 625 million parameters, covers more than 60,000 genes, and was pre-trained on a dataset of over 57 million cells, making it the most comprehensive solution for processing ultra-long transcriptome sequences. Extensive bench-marking across a diverse array of downstream tasks consistently demonstrates that SC-MAMBA2 surpasses state-of-the-art models, delivering superior accuracy and enhanced computational efficiency.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
22.8%
2
Nature Communications
4913 papers in training set
Top 17%
10.2%
3
Advanced Science
249 papers in training set
Top 2%
8.3%
4
Genome Biology
555 papers in training set
Top 2%
4.0%
5
Cell Discovery
54 papers in training set
Top 1%
3.6%
6
Cell Systems
167 papers in training set
Top 4%
3.6%
50% of probability mass above
7
iScience
1063 papers in training set
Top 6%
3.1%
8
Communications Biology
886 papers in training set
Top 3%
2.8%
9
Nature Methods
336 papers in training set
Top 3%
2.6%
10
PLOS ONE
4510 papers in training set
Top 50%
1.9%
11
Cell
370 papers in training set
Top 11%
1.7%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
13
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.7%
14
Genome Research
409 papers in training set
Top 2%
1.7%
15
Nature Medicine
117 papers in training set
Top 2%
1.5%
16
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.5%
17
Patterns
70 papers in training set
Top 1%
1.5%
18
Cell Reports
1338 papers in training set
Top 27%
1.3%
19
Nucleic Acids Research
1128 papers in training set
Top 13%
1.2%
20
Nature Cell Biology
99 papers in training set
Top 3%
1.2%
21
Protein & Cell
25 papers in training set
Top 2%
1.2%
22
Journal of Cell Biology
333 papers in training set
Top 3%
1.0%
23
Nature Biotechnology
147 papers in training set
Top 6%
1.0%
24
Journal of Genetics and Genomics
36 papers in training set
Top 2%
1.0%
25
Heliyon
146 papers in training set
Top 5%
0.9%
26
Cell Genomics
162 papers in training set
Top 6%
0.8%
27
Nature
575 papers in training set
Top 15%
0.8%
28
Bioinformatics
1061 papers in training set
Top 9%
0.8%
29
Life Science Alliance
263 papers in training set
Top 2%
0.8%
30
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%