Back

OneGenomeRice (OGR): A Genomic Foundation Model for Rice

Qian, B.; Liang, C.; Qin, C.; Liu, C.; Zhang, C.; Xu, C.; Li, D.; Xue, G.; He, H.; Zhang, H.; He, H.; Chen, D.; Xu, J.; Zhang, J.; Sun, J.; Shang, L.; Jiang, J.; Xia, K.-k.; Zhong, L.; Chen, L.-l.; Fan, L.; Liu, L.; Qin, M.-m.; Li, Q.; Zhu, S.; Ma, S.; Liu, S.; Zhang, S.; Fu, S.; Wei, T.; Xu, X.; Jia, X.; Xu, X.; Jing, Y.; Xu, Y.; Zhao, Y.; Xue, Y.; Guo, Y.; Xiao, Z.; Li, Z.; Li, Z.; Yue, Z.; Deng, Z.

2026-04-23 genomics
10.64898/2026.04.21.719822 bioRxiv
Show abstract

The transition of genomics to a predictive intelligence discipline is driven by the advent of genomic foundation models. While substantial progress has been observed in human-centric models, plant genomics, particularly for the staple crops, remains hindered by a lack of models. Here we introduce OneGenomeRice (OGR), a genomic foundation model for rice (Oryza sativa) engineered by a Mixture of Experts (MoE) transformer architecture with 1.25-billion-parameters. OGR was pre-trained on a genomic dataset comprising 422 high-quality genomes of cultivated and wild rice. A comprehensive benchmark, including short-sequence motif identification, long-range regulatory modeling, single-nucleotide resolution prediction, selective sweep detection and subspecies classification, demonstrated that OGR significantly outperforms existing state-of-the-art plant or all-life genome models in 11 categories. The model was also further used for several downstream applications, such as introgression between indica and japonica subspecies using embedding-based supervised classification, agronomy trait-associated functional loci through attention-derived importance signals, and gene expression prediction of DNA sequences etc. These results indicate OGR being a promising foundational computational infrastructure for functional genomics and precision breeding of rice.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Plant Communications
35 papers in training set
Top 0.1%
18.1%
2
Molecular Plant
36 papers in training set
Top 0.1%
8.2%
3
Genome Biology
555 papers in training set
Top 1%
6.2%
4
Cell Genomics
162 papers in training set
Top 0.6%
6.2%
5
Nature Communications
4913 papers in training set
Top 36%
4.2%
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
3.9%
7
Plant Biotechnology Journal
56 papers in training set
Top 0.3%
3.9%
50% of probability mass above
8
Nucleic Acids Research
1128 papers in training set
Top 7%
3.0%
9
Horticulture Research
43 papers in training set
Top 0.7%
3.0%
10
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.0%
11
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.0%
12
Advanced Science
249 papers in training set
Top 10%
1.8%
13
The Plant Journal
197 papers in training set
Top 2%
1.8%
14
Genome Medicine
154 papers in training set
Top 4%
1.8%
15
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
16
Nature Plants
84 papers in training set
Top 1.0%
1.8%
17
Communications Biology
886 papers in training set
Top 10%
1.6%
18
Nature Genetics
240 papers in training set
Top 5%
1.6%
19
PLOS Computational Biology
1633 papers in training set
Top 17%
1.6%
20
Nature Machine Intelligence
61 papers in training set
Top 2%
1.6%
21
Bioinformatics
1061 papers in training set
Top 7%
1.6%
22
Genome Research
409 papers in training set
Top 2%
1.6%
23
Frontiers in Genetics
197 papers in training set
Top 5%
1.6%
24
Bioinformatics Advances
184 papers in training set
Top 4%
1.3%
25
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.2%
26
GigaScience
172 papers in training set
Top 2%
1.1%
27
Frontiers in Plant Science
240 papers in training set
Top 4%
0.9%
28
New Phytologist
309 papers in training set
Top 4%
0.9%
29
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.5%
0.9%
30
The Plant Genome
53 papers in training set
Top 0.7%
0.7%