Back

The genetically-encoded amino acids distribute non-randomly within a functionally-relevant chemical space

Brown, S. M.; Hervey, J.; Dean, S. N.; Vora, G. J.

2026-05-07 synthetic biology
10.64898/2026.05.06.723277 bioRxiv
Show abstract

The standard set of 20 genetically-encoded amino acids (C20) exhibits a statistically non-random distribution in primarily two structurally-relevant physicochemical properties: hydrophobicity and molecular volume, and to a lesser extent charge. It remains an open question, however, whether evolutionary pressures similarly optimized the same alphabet for the distribution of functionally-relevant properties, such as reactivity. In this study, we used semi-empirical quantum chemistry simulations to calculate the highest occupied molecular orbital and lowest unoccupied molecular orbital (HOMO-LUMO) gaps for 84 xeno amino acids and constructed 10 million random 20-mer amino acid alphabets to determine where C20 fit amongst this background. The HOMO-LUMO gap measurements demonstrated that C20, similar to hydrophobicity and volume, also exhibits a non-random distribution. However, unlike hydrophobicity and volume, this distribution is non-random across an unevenly broad range. The results expand upon previous theory and suggest HOMO-LUMO gap energies as one synthetic biologists may consider when developing novel protein design tools or designing functional xeno amino acid alphabets. HighlightsO_LILifes amino acid alphabet is non-randomly distributed within an expanded computationally-generated chemistry space generated from large-scale quantum chemistry simulations. C_LIO_LIAmino acid alphabet coverage theory applies beyond structurally-relevant physicochemical descriptors to include functionally-relevant properties like reactivity as measured by frontier molecular orbitals C_LIO_LIFindings here provide a theoretical framework to guide the design of novel proteins and development of synthetic amino acid alphabets. C_LI

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
41.0%
2
Chemical Science
71 papers in training set
Top 0.1%
9.5%
50% of probability mass above
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.2%
8.5%
4
ACS Omega
90 papers in training set
Top 0.1%
5.0%
5
PLOS Computational Biology
1633 papers in training set
Top 11%
2.8%
6
The Journal of Physical Chemistry B
158 papers in training set
Top 0.7%
2.7%
7
International Journal of Molecular Sciences
453 papers in training set
Top 5%
2.0%
8
eLife
5422 papers in training set
Top 46%
1.4%
9
Journal of the American Chemical Society
199 papers in training set
Top 4%
1.4%
10
ACS Synthetic Biology
256 papers in training set
Top 2%
1.3%
11
Scientific Reports
3102 papers in training set
Top 67%
1.2%
12
JACS Au
35 papers in training set
Top 0.8%
0.9%
13
The Journal of Physical Chemistry Letters
58 papers in training set
Top 1%
0.8%
14
ChemBioChem
50 papers in training set
Top 1.0%
0.8%
15
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.8%
0.8%
16
PLOS ONE
4510 papers in training set
Top 65%
0.8%
17
Bioinformatics
1061 papers in training set
Top 9%
0.8%
18
Communications Chemistry
39 papers in training set
Top 1%
0.8%
19
Chemical Communications
24 papers in training set
Top 1%
0.8%
20
Angewandte Chemie International Edition
81 papers in training set
Top 4%
0.7%
21
PeerJ
261 papers in training set
Top 18%
0.5%
22
Frontiers in Pharmacology
100 papers in training set
Top 6%
0.5%
23
Journal of Molecular Biology
217 papers in training set
Top 5%
0.5%
24
Computational Biology and Chemistry
23 papers in training set
Top 0.7%
0.5%