Back

Protein evolution is structure dependent and non-homogeneous across the tree of life

Pandey, A.; Braun, E. L.

2020-01-29 evolutionary biology
10.1101/2020.01.28.923458 bioRxiv
Show abstract

MotivationProtein sequence evolution is a complex process that varies among-sites within proteins and across the tree of life. Comparisons of evolutionary rate matrices for specific taxa ( clade-specific models) have the potential to reveal this variation and provide information about the underlying reasons for those changes. To study changes in patterns of protein sequence evolution we estimated and compared clade-specific models in a way that acknowledged variation within proteins due to structure. ResultsClade-specific model fit was able to correctly classify proteins from four specific groups (vertebrates, plants, oomycetes, and yeasts) more than 70% of the time. This was true whether we used mixture models that incorporate relative solvent accessibility or simple models that treat sites as homogeneous. Thus, protein evolution is non-homogeneous over the tree of life. However, a small number of dimensions could explain the differences among models (for mixture models ~50% of the variance reflected relative solvent accessibility and ~25% reflected clade). Relaxed purifying selection in taxa with lower long-term effective population sizes appears to explain much of the among clade variance. Relaxed selection on solvent-exposed sites was correlated with changes in amino acid side-chain volume; other differences among models were more complex. Beyond the information they reveal about protein evolution, our clade-specific models also represent tools for phylogenomic inference. AvailabilityModel files are available from https://github.com/ebraun68/clade_specific_prot_models. Contactebraun68@ufl.edu Supplementary informationSupplementary data are appended to this preprint.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Molecular Biology and Evolution
488 papers in training set
Top 0.1%
21.4%
2
Genome Biology and Evolution
280 papers in training set
Top 0.1%
9.6%
3
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
8.0%
4
PLOS Biology
408 papers in training set
Top 1%
6.0%
5
PLOS Computational Biology
1633 papers in training set
Top 6%
6.0%
50% of probability mass above
6
Bioinformatics
1061 papers in training set
Top 4%
6.0%
7
Evolution Letters
71 papers in training set
Top 0.5%
4.1%
8
Protein Science
221 papers in training set
Top 0.3%
3.8%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.8%
10
BMC Ecology and Evolution
49 papers in training set
Top 0.6%
2.5%
11
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 3%
2.5%
12
Evolution
199 papers in training set
Top 1%
1.8%
13
PeerJ
261 papers in training set
Top 8%
1.6%
14
PLOS ONE
4510 papers in training set
Top 56%
1.6%
15
Journal of Evolutionary Biology
98 papers in training set
Top 0.6%
1.4%
16
Genetics
225 papers in training set
Top 3%
1.3%
17
Virus Evolution
140 papers in training set
Top 1.0%
1.3%
18
Frontiers in Ecology and Evolution
60 papers in training set
Top 3%
1.2%
19
Scientific Reports
3102 papers in training set
Top 68%
1.2%
20
eLife
5422 papers in training set
Top 54%
0.8%
21
PLOS Genetics
756 papers in training set
Top 16%
0.7%
22
F1000Research
79 papers in training set
Top 5%
0.7%
23
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.7%
24
Molecular Ecology
304 papers in training set
Top 5%
0.6%
25
Current Biology
596 papers in training set
Top 16%
0.6%