OneGenomeRice (OGR): A Genomic Foundation Model for Rice
Qian, B.; Liang, C.; Qin, C.; Liu, C.; Zhang, C.; Xu, C.; Li, D.; Xue, G.; He, H.; Zhang, H.; He, H.; Chen, D.; Xu, J.; Zhang, J.; Sun, J.; Shang, L.; Jiang, J.; Xia, K.-k.; Zhong, L.; Chen, L.-l.; Fan, L.; Liu, L.; Qin, M.-m.; Li, Q.; Zhu, S.; Ma, S.; Liu, S.; Zhang, S.; Fu, S.; Wei, T.; Xu, X.; Jia, X.; Xu, X.; Jing, Y.; Xu, Y.; Zhao, Y.; Xue, Y.; Guo, Y.; Xiao, Z.; Li, Z.; Li, Z.; Yue, Z.; Deng, Z.
Show abstract
The transition of genomics to a predictive intelligence discipline is driven by the advent of genomic foundation models. While substantial progress has been observed in human-centric models, plant genomics, particularly for the staple crops, remains hindered by a lack of models. Here we introduce OneGenomeRice (OGR), a genomic foundation model for rice (Oryza sativa) engineered by a Mixture of Experts (MoE) transformer architecture with 1.25-billion-parameters. OGR was pre-trained on a genomic dataset comprising 422 high-quality genomes of cultivated and wild rice. A comprehensive benchmark, including short-sequence motif identification, long-range regulatory modeling, single-nucleotide resolution prediction, selective sweep detection and subspecies classification, demonstrated that OGR significantly outperforms existing state-of-the-art plant or all-life genome models in 11 categories. The model was also further used for several downstream applications, such as introgression between indica and japonica subspecies using embedding-based supervised classification, agronomy trait-associated functional loci through attention-derived importance signals, and gene expression prediction of DNA sequences etc. These results indicate OGR being a promising foundational computational infrastructure for functional genomics and precision breeding of rice.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.