Back

The Use of GC-, Codon-, and Amino Acid-frequencies to Understand the Evolutionary Forces at a Genomic Scale.

Elofsson, A.

2019-12-03 evolutionary biology
10.1101/863142 bioRxiv
Show abstract

1It is well known that the GC content varies enormously between organisms; this is believed to be caused by a combination of mutational preferences and selective pressure. Within coding regions, the variation of GC is more substantial in position three and smaller in position one and two. Less well known is that this variation also has an enormous impact on the frequency of amino acids as their codons vary in GC content. For instance, the fraction of alanines in different proteomes varies from 1.1% to 16.5%. In general, the frequency of different amino acids correlates strongly with the number of codons, the GC content of these codons and the genomic GC contents. However, there are clear and systematic deviations from the expected frequencies. Some amino acids are more frequent than expected by chance, while others are less frequent. A plausible model to explain this is that there exist two different selective forces acting on the genes; First, there exists a force acting to maintain the overall GC level and secondly there exists a selective force acting on the amino acid level. Here, we use the divergence in amino acid frequency from what is expected by the GC content to analyze the selective pressure acting on codon frequencies in the three kingdoms of life. We find four major selective forces; First, the frequency of serine is lower than expected in all genomes, but most in prokaryotes. Secondly, there exist a selective pressure acting to balance positively and negatively charged amino acids, which results in a reduction of arginine and negatively charged amino acids. This results in a reduction of arginine and all the negatively charged amino acids. Thirdly, the frequency of the hydrophobic residues encoded by a T in the second codon position does not change with GC. Their frequency is lower in eukaryotes than in prokaryotes. Finally, some amino acids with unique properties, such as proline glycine and proline, are limited in their frequency variation.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Molecular Biology and Evolution
488 papers in training set
Top 0.1%
21.3%
2
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
13.6%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 7%
9.6%
4
Scientific Reports
3102 papers in training set
Top 21%
6.0%
50% of probability mass above
5
Genome Biology and Evolution
280 papers in training set
Top 0.3%
6.0%
6
BMC Ecology and Evolution
49 papers in training set
Top 0.6%
2.6%
7
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
8
iScience
1063 papers in training set
Top 8%
2.5%
9
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 2%
2.0%
10
Evolution
199 papers in training set
Top 1%
2.0%
11
Bioinformatics
1061 papers in training set
Top 7%
1.6%
12
PeerJ
261 papers in training set
Top 8%
1.6%
13
Genetics
225 papers in training set
Top 3%
1.6%
14
PLOS Genetics
756 papers in training set
Top 9%
1.6%
15
Evolution Letters
71 papers in training set
Top 1%
1.6%
16
PLOS ONE
4510 papers in training set
Top 58%
1.4%
17
Journal of Evolutionary Biology
98 papers in training set
Top 0.7%
1.3%
18
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 5%
1.1%
19
Frontiers in Ecology and Evolution
60 papers in training set
Top 3%
0.9%
20
eLife
5422 papers in training set
Top 54%
0.8%
21
Genes
126 papers in training set
Top 3%
0.7%
22
BMC Biology
248 papers in training set
Top 5%
0.7%
23
F1000Research
79 papers in training set
Top 5%
0.7%
24
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
25
Journal of The Royal Society Interface
189 papers in training set
Top 5%
0.7%
26
Current Biology
596 papers in training set
Top 15%
0.7%
27
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.6%
28
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%