The genetically-encoded amino acids distribute non-randomly within a functionally-relevant chemical space
Brown, S. M.; Hervey, J.; Dean, S. N.; Vora, G. J.
Show abstract
The standard set of 20 genetically-encoded amino acids (C20) exhibits a statistically non-random distribution in primarily two structurally-relevant physicochemical properties: hydrophobicity and molecular volume, and to a lesser extent charge. It remains an open question, however, whether evolutionary pressures similarly optimized the same alphabet for the distribution of functionally-relevant properties, such as reactivity. In this study, we used semi-empirical quantum chemistry simulations to calculate the highest occupied molecular orbital and lowest unoccupied molecular orbital (HOMO-LUMO) gaps for 84 xeno amino acids and constructed 10 million random 20-mer amino acid alphabets to determine where C20 fit amongst this background. The HOMO-LUMO gap measurements demonstrated that C20, similar to hydrophobicity and volume, also exhibits a non-random distribution. However, unlike hydrophobicity and volume, this distribution is non-random across an unevenly broad range. The results expand upon previous theory and suggest HOMO-LUMO gap energies as one synthetic biologists may consider when developing novel protein design tools or designing functional xeno amino acid alphabets. HighlightsO_LILifes amino acid alphabet is non-randomly distributed within an expanded computationally-generated chemistry space generated from large-scale quantum chemistry simulations. C_LIO_LIAmino acid alphabet coverage theory applies beyond structurally-relevant physicochemical descriptors to include functionally-relevant properties like reactivity as measured by frontier molecular orbitals C_LIO_LIFindings here provide a theoretical framework to guide the design of novel proteins and development of synthetic amino acid alphabets. C_LI
Matching journals
The top 2 journals account for 50% of the predicted probability mass.