Back

Using ARCADE (ARChaeplastida Annotation DatabasE) to understand the evolution of genome size in land plants

Menezes, A. P. A.; Almeida, J. V. d. A.; Del-Bem, L.-E.; Lobo, F. P.

2022-09-08 plant biology
10.1101/2022.09.06.506765 bioRxiv
Show abstract

The abundance of plant genomic information caused by the decrease of sequencing costs contrasts with the lack of databases that integrate genome annotation, taxonomy and phenotypes to produce statistically sound, biologically meaningful knowledge. Here we present ARCADE (ARChaeplastida Annotation DatabasE), a database of 171 high-quality archaeplastidian non-redundant proteomes gathered from six primary genomic databases, together with proteome quality metrics anda growing number of associated metadata. As a case study to demonstrate the usefulness of ARCADE, we used it to investigate the expansion and contraction of protein domains associated with the evolution of genome size (hereafter GS). GS varies greatly among land plants and the synthesis of large genomes can be costly to cells. Although GS has been studied extensively for decades, the molecular mechanisms involved in the adaptations of plants to the increase in GS are still poorly understood. We used the annotation and phylogenetic information available in ARCADE, together with estimated GS values available for 83 land plant species, to search for associations between the abundance of protein domain families in these species and GS variation through phylogenetic-aware methods. Additionally, we estimated the GS for the ancestral nodes of the extant land plant species. GS seems to be decreasing along the course of evolution, except for a few branches that might have undergone independent GS increases. We found 7 Pfam correlated with the variation in GS in land plants, mainly related to nucleotide metabolism, DNA repair and genome organization. We found larger genomes to have a greater frequency of the Histone 2A superfamily, responsible for diverse functions, including the nucleosome formation and silencing of transposable elements. These molecular functions we found correlated to GS variation suggests they may be associated with preserving genome stability in larger genomes, and might indicate the evolution of mechanisms to cope with the variation in GS in land plants. ARCADE is available at https://bit.ly/ARCADE_OSF.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
9.9%
2
The Plant Journal
197 papers in training set
Top 0.4%
9.9%
3
Genome Biology and Evolution
280 papers in training set
Top 0.2%
6.3%
4
Plant Physiology
217 papers in training set
Top 0.8%
6.2%
5
GigaScience
172 papers in training set
Top 0.3%
4.8%
6
Frontiers in Plant Science
240 papers in training set
Top 2%
4.8%
7
Plant Communications
35 papers in training set
Top 0.3%
3.9%
8
Plant Direct
81 papers in training set
Top 0.5%
3.9%
9
PLOS ONE
4510 papers in training set
Top 37%
3.9%
50% of probability mass above
10
Scientific Reports
3102 papers in training set
Top 38%
3.5%
11
PeerJ
261 papers in training set
Top 4%
2.7%
12
Journal of Genetics and Genomics
36 papers in training set
Top 0.5%
2.7%
13
Database
51 papers in training set
Top 0.4%
1.7%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.7%
15
BMC Genomics
328 papers in training set
Top 3%
1.6%
16
BMC Plant Biology
47 papers in training set
Top 0.4%
1.6%
17
New Phytologist
309 papers in training set
Top 3%
1.5%
18
F1000Research
79 papers in training set
Top 2%
1.3%
19
Journal of Experimental Botany
195 papers in training set
Top 2%
1.2%
20
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
21
PLOS Computational Biology
1633 papers in training set
Top 21%
1.1%
22
Molecular Plant
36 papers in training set
Top 1%
1.1%
23
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.9%
24
Genome Biology
555 papers in training set
Top 7%
0.9%
25
The Plant Cell
141 papers in training set
Top 2%
0.8%
26
Journal of Structural Biology
58 papers in training set
Top 1%
0.8%
27
iScience
1063 papers in training set
Top 33%
0.7%
28
Genomics
60 papers in training set
Top 3%
0.7%
29
International Journal of Molecular Sciences
453 papers in training set
Top 17%
0.7%
30
BMC Biology
248 papers in training set
Top 6%
0.6%