Back

Genomic dialects: How amino acid properties and the second codon base shape the informational accents of life

Martinez, O.; Ochoa-Alejo, N.

2026-04-24 bioinformatics
10.64898/2026.04.21.720023 bioRxiv
Show abstract

Codon Usage Bias (CUB) is a fundamental feature of genomic architecture, reflecting a balance between mutational pressure and natural selection. We propose a "genomic dialects" framework, where species-specific CUB profiles represent "informational accents" constrained by biochemical and structural requirements. Utilizing a normalized informational index based on Shannons entropy, we analyzed CUB profiles for 18 amino acids across 1,406 species from the three domains of life. Linear models were employed to investigate the relationship between CUB and physicochemical properties, including Saiers second-codon-base classification, molecular volume, hydrophobicity, aliphatic/aromatic status, and dissociation constants. CUB distributions are highly skewed, with > 52% of values below 0.1, suggesting a near-optimal use of the genetic codes potential. We demonstrate that amino acid properties significantly influence CUB, with Saiers classification explaining up to 69% of variance in Archaea and{approx} 47% across all taxa. Hydrophobic amino acids (Q1 class) consistently exhibit higher average CUB than hydrophilic ones, particularly in microbes. Individual species models reveal extreme correlations; for example, in the alga Chlamydomonas reinhardtii, Saier classes explain > 95% of CUB variance. Finally, we show that CUB-based dendrograms represent phenetic similarity ("genomic accents") rather than reliable phylogenetic reconstructions, as they rarely coincide with the true Tree of Life. Our findings indicate that the "rules" of genomic dialects are largely anchored in the dual requirements of translational fidelity and protein stability. The observed "informational accents" are proximately governed by the metabolic and genomic machinery under the constraints of the drift-barrier hypothesis. This study provides a robust framework for understanding how the physical realities of amino acids have shaped the evolution of the genetic codes informational use across the tree of life.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Molecular Biology and Evolution
488 papers in training set
Top 0.1%
22.1%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 2%
18.2%
3
PLOS Computational Biology
1633 papers in training set
Top 6%
6.2%
4
eLife
5422 papers in training set
Top 18%
4.8%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 34%
4.8%
6
Nucleic Acids Research
1128 papers in training set
Top 4%
4.8%
7
Cell Systems
167 papers in training set
Top 3%
4.2%
8
Science Advances
1098 papers in training set
Top 4%
3.9%
9
Journal of Molecular Biology
217 papers in training set
Top 1%
1.8%
10
Scientific Reports
3102 papers in training set
Top 56%
1.7%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
12
Journal of The Royal Society Interface
189 papers in training set
Top 3%
1.5%
13
mSystems
361 papers in training set
Top 6%
1.2%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
15
iScience
1063 papers in training set
Top 25%
0.9%
16
Genome Biology and Evolution
280 papers in training set
Top 2%
0.9%
17
Genome Biology
555 papers in training set
Top 7%
0.9%
18
PLOS ONE
4510 papers in training set
Top 64%
0.9%
19
Frontiers in Microbiology
375 papers in training set
Top 8%
0.8%
20
Current Biology
596 papers in training set
Top 13%
0.8%
21
Genetics
225 papers in training set
Top 4%
0.7%
22
Advanced Science
249 papers in training set
Top 21%
0.7%
23
Cell Reports
1338 papers in training set
Top 35%
0.7%
24
Science
429 papers in training set
Top 21%
0.7%
25
Genome Research
409 papers in training set
Top 5%
0.6%
26
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.6%