Genomic dialects: How amino acid properties and the second codon base shape the informational accents of life

Martinez, O.; Ochoa-Alejo, N.

2026-04-24 bioinformatics

10.64898/2026.04.21.720023 bioRxiv

Show abstract

Codon Usage Bias (CUB) is a fundamental feature of genomic architecture, reflecting a balance between mutational pressure and natural selection. We propose a "genomic dialects" framework, where species-specific CUB profiles represent "informational accents" constrained by biochemical and structural requirements. Utilizing a normalized informational index based on Shannons entropy, we analyzed CUB profiles for 18 amino acids across 1,406 species from the three domains of life. Linear models were employed to investigate the relationship between CUB and physicochemical properties, including Saiers second-codon-base classification, molecular volume, hydrophobicity, aliphatic/aromatic status, and dissociation constants. CUB distributions are highly skewed, with > 52% of values below 0.1, suggesting a near-optimal use of the genetic codes potential. We demonstrate that amino acid properties significantly influence CUB, with Saiers classification explaining up to 69% of variance in Archaea and{approx} 47% across all taxa. Hydrophobic amino acids (Q1 class) consistently exhibit higher average CUB than hydrophilic ones, particularly in microbes. Individual species models reveal extreme correlations; for example, in the alga Chlamydomonas reinhardtii, Saier classes explain > 95% of CUB variance. Finally, we show that CUB-based dendrograms represent phenetic similarity ("genomic accents") rather than reliable phylogenetic reconstructions, as they rarely coincide with the true Tree of Life. Our findings indicate that the "rules" of genomic dialects are largely anchored in the dual requirements of translational fidelity and protein stability. The observed "informational accents" are proximately governed by the metabolic and genomic machinery under the constraints of the drift-barrier hypothesis. This study provides a robust framework for understanding how the physical realities of amino acids have shaped the evolution of the genetic codes informational use across the tree of life.

Genomic dialects: How amino acid properties and the second codon base shape the informational accents of life

Matching journals