Back

Codon degeneracy as well as pretermination codons influence transition to transversion ratio in coding sequences

Beura, P. K.; Aziz, R.; Sen, P.; Das, S.; Namsa, N. D.; Feil, E.; Satapathy, S. S.; Ray, S. K.

2022-11-13 evolutionary biology
10.1101/2022.11.03.515082 bioRxiv
Show abstract

Transition (ti) and transversion (tv) are the major causes for genome variation. The accurate estimation of ti to tv ratio [Formula] in genomes is crucial for understanding of mutational and selection processes in organisms as it is influenced by both codon degeneracy and pretermination codons (PTC). Therefore, we developed a method (accessible at https://github.com/CBBILAB/CBBI.git) to estimate [Formula] ratio by accounting codon degeneracy as well as PTC in protein coding sequences. Our findings revealed a distinct impact of codon degeneracy and PTC on the [Formula] ratio in the Escherichia coli genome. We observed a decreasing order among the frequencies of different base substitutions such as synonymous transition (Sti) > synonymous transversion (Stv) > non-synonymous transition (Nti) > non-synonymous transversion (Ntv) in E. coli genome. The correlation was strong between Sti and Stv values (Pearson r value 0.795) whereas the correlation was weak between Sti and Nti (Pearson r value 0.192). Coding sequences with similar Sti values exhibited a wide range of Nti values. This indicated the varying strength of purifying selection acting on the coding sequences. In concordance with the assumption, the genes having higher Nti values were observed with lower codon adaptation index (CAI) values than that of the genes having lower Nti values. Our approach is convenient to visualize the frequency of base substitution variation as well as selection in protein coding sequences. The proposed method is useful to estimate different [Formula] ratios accurately in coding sequences and is insightful from an evolutionary perspective. Article SummaryGenetic diversity is pivotal in evolution, with base substitution as a key driver. Transition (ti) frequency surpasses transversion (tv) frequency in genomes, making [Formula] ratios a valuable metric for studying mutation bias. Our improved estimator for [Formula] calculation accounts for codon degeneracy and nonsense substitutions in pretermination codons. Additionally, we unveil insights into the frequency of different substitutions such as Sti, Stv, Nti, and Ntv and demonstrate the impact of selection on protein coding sequences.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
14.7%
2
BMC Ecology and Evolution
49 papers in training set
Top 0.1%
9.1%
3
PLOS ONE
4510 papers in training set
Top 22%
8.4%
4
Bioinformatics
1061 papers in training set
Top 5%
4.3%
5
Scientific Reports
3102 papers in training set
Top 29%
4.2%
6
PLOS Genetics
756 papers in training set
Top 4%
3.7%
7
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.6%
8
PeerJ
261 papers in training set
Top 3%
3.6%
50% of probability mass above
9
Frontiers in Ecology and Evolution
60 papers in training set
Top 1%
3.1%
10
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
11
Genome Biology and Evolution
280 papers in training set
Top 0.6%
2.7%
12
Genes
126 papers in training set
Top 0.6%
2.1%
13
BMC Genomics
328 papers in training set
Top 2%
2.1%
14
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.8%
15
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
16
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
17
Infection, Genetics and Evolution
43 papers in training set
Top 0.5%
1.5%
18
Biosystems
18 papers in training set
Top 0.3%
1.2%
19
Ecology and Evolution
232 papers in training set
Top 3%
0.9%
20
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.9%
21
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
22
Genomics
60 papers in training set
Top 2%
0.9%
23
Biochimie
23 papers in training set
Top 0.3%
0.9%
24
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.8%
25
Molecular Genetics and Genomics
11 papers in training set
Top 0.3%
0.8%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.7%
28
Gene
41 papers in training set
Top 2%
0.7%
29
Biology Open
130 papers in training set
Top 3%
0.7%
30
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%