Back

Rice Annotation Project Database (RAP-DB): literature-curated gene annotation and integrated omics resources for rice functional genomics and molecular breeding

Kawahara, Y.; Kishikawa, T. H.; Hirata, R.; Wang, X.; Tamagaki, Y.; Kumagai, M.; Tabei, N.; Sakai, H.; Itoh, T.

2026-01-21 bioinformatics
10.64898/2026.01.16.699882 bioRxiv
Show abstract

High-throughput sequencing technologies have enabled the generation of high-quality reference genomes for numerous rice cultivars. However, inferring gene functions, associated phenotypes, and causal variants from these sequences remains challenging. The Rice Annotation Project Database (RAP-DB; https://rapdb.dna.affrc.go.jp) is a curated genomic resource that provides comprehensive gene annotations for the reference genome of Oryza sativa ssp. japonica cv. Nipponbare. Since its major update in 2013, gene models and functional annotations have been continuously revised through expert manual curation of newly published literature related to rice genes. As of March 2025, a total of 6,631 transcripts corresponding to 6,371 loci have been curated based on 4,699 peer-reviewed publications. These curated genes are functionally characterized and are frequently associated with agronomic traits, including yield components, stress tolerance, and disease resistance. To support molecular breeding, RAP-DB now provides a curated catalogue of 904 agronomically important loci, including gene symbols, functional descriptions, and associated traits, together with more than 1,000 functionally characterized alleles compiled from the literature. In addition to in-house expert curation, RAP-DB integrates community-curated datasets for major gene families, such as WRKY transcription factors, S-domain receptor-like kinases, and leucine-rich repeat-containing receptors, thereby expanding coverage of key regulatory and defense-related genes. RAP-DB also incorporates reanalyzed RNA sequencing expression profiles alongside microarray-based expression data and co-expression networks, offering gene-centric views of expression patterns across tissues, conditions, and developmental stages. Furthermore, RAP-DB is linked to genome-wide variation datasets from diverse rice varieties through the TASUKE+ genome browser, enabling exploration of allelic diversity across varieties. To enhance annotation quality and long-term sustainability, AI-assisted literature screening and a web-based feedback system have been introduced, allowing users to submit corrections to gene models and report newly characterized genes or relevant publications. Together, these developments strengthen RAP-DB as a primary, literature-based gene annotation resource and provide a practical foundation for molecular breeding in rice.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Plant Communications
35 papers in training set
Top 0.1%
18.2%
2
Horticulture Research
43 papers in training set
Top 0.1%
14.0%
3
Molecular Plant
36 papers in training set
Top 0.1%
12.1%
4
Plant Biotechnology Journal
56 papers in training set
Top 0.1%
8.2%
50% of probability mass above
5
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.9%
6.7%
6
Scientific Data
174 papers in training set
Top 0.5%
3.5%
7
Plant Physiology
217 papers in training set
Top 1%
3.5%
8
The Plant Genome
53 papers in training set
Top 0.2%
3.5%
9
Frontiers in Plant Science
240 papers in training set
Top 3%
3.0%
10
Scientific Reports
3102 papers in training set
Top 51%
2.0%
11
PLOS ONE
4510 papers in training set
Top 51%
1.8%
12
Theoretical and Applied Genetics
46 papers in training set
Top 0.2%
1.7%
13
Nucleic Acids Research
1128 papers in training set
Top 12%
1.4%
14
Plant and Cell Physiology
31 papers in training set
Top 0.2%
1.3%
15
Plant Direct
81 papers in training set
Top 2%
1.2%
16
New Phytologist
309 papers in training set
Top 4%
1.2%
17
The Plant Journal
197 papers in training set
Top 3%
1.2%
18
Nature Communications
4913 papers in training set
Top 59%
0.9%
19
Journal of Experimental Botany
195 papers in training set
Top 3%
0.9%
20
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
21
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
22
Communications Biology
886 papers in training set
Top 25%
0.7%
23
Database
51 papers in training set
Top 1%
0.7%
24
Bioinformatics
1061 papers in training set
Top 10%
0.7%
25
Plant Science
25 papers in training set
Top 1%
0.6%