Toward mastering the cell language by learning to generate
Chen, Y.; Bian, H.; Wei, L.; Jia, J.; Dong, X.; Li, Y.; Zhao, Y.; Wu, X.; Li, C.; Luo, E.; Xiao, C.; Hao, M.; Zhang, X.
Show abstract
Cells can be viewed as complex stories written by coordinated expression of genes. The success of AI large language models (LLMs) in mastering the human language inspired us to develop a large AI model scMulan with 368 million parameters to generate cell transcriptomics with designated attributes by learning the cell language. We defined a unified c-sentence to incorporate cell transcriptomics and meta-attributes, and pre-trained scMulan on the equivalence of 100 million human cells. Experiments showed that scMulan can generate designated pseudo transcriptomics, predict missing attributes of cells, reconstruct unobserved cells along functional gradients, and can help to identify driving regulators of cell fates. The generated data passed tests of current tools and can reflect the underlying biology.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.