Back

A comprehensive genealogy of the replication associated protein of CRESS DNA viruses reveals a single origin of intron-containing Rep

Zhao, L.; Lavington, E.; Duffy, S.

2019-07-01 evolutionary biology
10.1101/687855 bioRxiv
Show abstract

Abundant novel circular Rep-encoding ssDNA viruses (CRESS DNA viruses) have been discovered in the past decade, prompting a new appreciation for the ubiquity and genomic diversity of this group of viruses. Although highly divergent in the hosts they infect or are associated with, CRESS DNA viruses are united by the homologous replication-associated protein (Rep). An accurate genealogy of Rep can therefore provide insights into how these diverse families are related to each other. We used a dataset of eukaryote-associated CRESS DNA RefSeq genomes (n=926), which included representatives from all six established families and unclassified species. To assure an optimal Rep genealogy, we derived and tested a bespoke amino acid substitution model (named CRESS), which outperformed existing protein matrices in describing the evolution of Rep. The CRESS model-estimated Rep genealogy resolved the monophyly of Bacilladnaviridae and the reciprocal monophyly of Nanoviridae and the alpha-satellites when trees estimated with general matrices like LG did not. The most intriguing, previously unobserved result is a likely single origin of intron-containing Reps, which causes several geminivirus genera to group with Genomoviridae (bootstrap support 55%, aLRT SH-like support 0.997, 0.91-0.997 in trees estimated with established matrices). This grouping, which eliminates the monophyly of Geminiviridae, is supported by both domains of Rep, and appears to be related to our use of all RefSeq Reps instead of subsampling to get a smaller dataset. In addition to producing a trustworthy Rep genealogy, the derived CRESS matrix is proving useful for other analyses; it best fit alignments of capsid protein sequences from several CRESS DNA families and parvovirus NS1/Rep sequences.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Virus Evolution
140 papers in training set
Top 0.1%
46.4%
2
Journal of Virology
456 papers in training set
Top 1%
4.5%
50% of probability mass above
3
PLOS Biology
408 papers in training set
Top 2%
4.5%
4
PLOS Pathogens
721 papers in training set
Top 3%
3.7%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 22%
3.2%
6
Nature Communications
4913 papers in training set
Top 43%
2.9%
7
eLife
5422 papers in training set
Top 34%
2.2%
8
Molecular Biology and Evolution
488 papers in training set
Top 2%
2.2%
9
Communications Biology
886 papers in training set
Top 6%
2.0%
10
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
11
Viruses
318 papers in training set
Top 2%
1.9%
12
Nature Ecology & Evolution
113 papers in training set
Top 2%
1.8%
13
Nature Microbiology
133 papers in training set
Top 3%
1.4%
14
Science Advances
1098 papers in training set
Top 21%
1.4%
15
Science
429 papers in training set
Top 16%
1.4%
16
PLOS ONE
4510 papers in training set
Top 62%
1.0%
17
Peer Community Journal
254 papers in training set
Top 3%
0.8%
18
mBio
750 papers in training set
Top 11%
0.8%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
20
Scientific Reports
3102 papers in training set
Top 74%
0.8%
21
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
22
Genome Biology
555 papers in training set
Top 8%
0.7%
23
Current Biology
596 papers in training set
Top 15%
0.7%
24
Bioinformatics
1061 papers in training set
Top 10%
0.7%
25
iScience
1063 papers in training set
Top 39%
0.5%
26
Microbiome
139 papers in training set
Top 4%
0.5%
27
Nucleic Acids Research
1128 papers in training set
Top 20%
0.5%