Back

Minimal Amino Acid Alphabet for Protein Design

Pubal, K.; Kushnir, K.; Spiwok, V.; Louzecka, K.; Setnicka, V.; Lipovova, P.

2026-03-06 bioinformatics
10.64898/2026.03.06.710107 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWProteins are built from 20 canonical amino acids. It is interesting to explore whether proteins can be formed from significantly reduced amino acid alphabets. Our bioinformatics survey of UniProt (more than 250 M sequences) revealed that proteins composed of reduced amino acid alphabets (< 10) are extremely rare among existing proteins. Next, we used computational protein design to design proteins composed of all 1,013 possible alphabets of 2-10 early amino acids (Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val). The length of all proteins was 100 amino acid residues. Small amino acid alphabets preferred simple helices or helix bundles. Larger amino acid alphabets allowed for the design of more complex structures. A protein composed of 8 amino acids (Ala, Asp, Gly, Leu, Val, Ser, Thr, and Pro) was successfully experimentally verified. It belongs to fibronectin type III domain {beta}-sheet-rich architecture. Attempts to experimentally verify designs composed of 6 and 4 amino acids were unsuccessful. We show by a computational experiment without an experimental validation that inverse folding programs, namely ProteinMPNN, can stabilize designed proteins within the same amino acid alphabet. Our results show that globular proteins may have formed early in evolution. Furthermore, we show that it is possible to design proteins with interesting properties for biotechnology and synthetic biology.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 11%
8.2%
2
Bioinformatics
1061 papers in training set
Top 4%
6.2%
3
Protein Science
221 papers in training set
Top 0.2%
6.2%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.2%
5
PLOS ONE
4510 papers in training set
Top 29%
6.2%
6
PLOS Computational Biology
1633 papers in training set
Top 6%
6.2%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.7%
6.2%
8
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
4.1%
9
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.2%
3.9%
50% of probability mass above
10
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.5%
3.5%
11
Molecules
37 papers in training set
Top 0.7%
1.8%
12
PeerJ
261 papers in training set
Top 7%
1.7%
13
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
14
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.7%
15
Journal of Structural Biology
58 papers in training set
Top 0.8%
1.6%
16
International Journal of Biological Macromolecules
65 papers in training set
Top 2%
1.4%
17
Journal of Molecular Biology
217 papers in training set
Top 2%
1.3%
18
ACS Omega
90 papers in training set
Top 3%
1.2%
19
Biochemistry and Biophysics Reports
28 papers in training set
Top 0.9%
1.2%
20
F1000Research
79 papers in training set
Top 3%
1.2%
21
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.1%
22
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 8%
0.9%
23
Physical Biology
43 papers in training set
Top 2%
0.9%
24
Frontiers in Bioinformatics
45 papers in training set
Top 0.8%
0.8%
25
Biomolecules
95 papers in training set
Top 2%
0.8%
26
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
27
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 3%
0.7%
28
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
29
Biophysical Journal
545 papers in training set
Top 5%
0.7%
30
Biology
43 papers in training set
Top 3%
0.7%