Back

Sequence-Based Generative AI-Guided Design of Versatile Tryptophan Synthases

Lambert, T.; Tavakoli, A.; Dharuman, G.; Yang, J.; Bhethanabotla, V.; Kaur, S.; Hill, M.; Ramanathan, A.; Anandkumar, A.; Arnold, F. H.

2025-08-30 biochemistry
10.1101/2025.08.30.673177 bioRxiv
Show abstract

Enzymes offer unparalleled selectivity and sustainability for chemical synthesis, yet their widespread industrial application is often hindered by the slow and uncertain process of discovering and optimizing suitable biocatalysts. While directed evolution remains the gold standard for enzyme optimization, its success hinges on the availability of a starting enzyme with measurable activity, a persistent bottleneck for many desired functions. Designing libraries likely to contain such functional starting points remains a major challenge. In this work, we use the GenSLM protein language model (PLM) along with a series of filters to generate novel sequences of the {beta}-subunit of tryptophan synthase (TrpB) that express in Escherichia coli, are stable, and are catalytically active in the absence of a TrpA partner. Many generated TrpBs also demonstrated significant substrate promiscuity, accepting non-canonical substrates typically inaccessible to natural TrpBs. Remarkably, several outperformed both natural and laboratory-optimized TrpBs on native and non-canonical substrates. Comparative analysis of the most active and promiscuous generated TrpB and its closest natural homolog confirmed that this enhanced functional versatility does not stem from the natural enzyme, highlighting the creative potential of generative models. Our results demonstrate that the model can generate enzymes which not only preserve natural structure and function but also acquire non-natural properties, establishing PLMs as powerful tools for biocatalyst discovery and engineering, with the potential in some cases to bypass further optimization.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Angewandte Chemie International Edition
81 papers in training set
Top 0.1%
22.6%
2
ACS Catalysis
16 papers in training set
Top 0.1%
14.8%
3
Journal of the American Chemical Society
199 papers in training set
Top 0.6%
10.1%
4
Chemical Science
71 papers in training set
Top 0.1%
8.4%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 29%
6.3%
6
ACS Central Science
66 papers in training set
Top 0.5%
3.1%
7
JACS Au
35 papers in training set
Top 0.2%
2.4%
8
Nature Chemistry
34 papers in training set
Top 0.3%
2.4%
9
Biochemistry
130 papers in training set
Top 0.5%
2.4%
10
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.9%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 33%
1.7%
12
Nature Chemical Biology
104 papers in training set
Top 2%
1.2%
13
Chemical Communications
24 papers in training set
Top 0.7%
1.2%
14
ACS Chemical Biology
150 papers in training set
Top 1%
1.2%
15
Journal of Natural Products
11 papers in training set
Top 0.2%
1.1%
16
ChemBioChem
50 papers in training set
Top 0.9%
1.0%
17
ACS Medicinal Chemistry Letters
16 papers in training set
Top 0.4%
1.0%
18
eLife
5422 papers in training set
Top 51%
1.0%
19
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%
20
Protein Science
221 papers in training set
Top 2%
0.7%
21
The Journal of Physical Chemistry Letters
58 papers in training set
Top 2%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
23
Cell Chemical Biology
81 papers in training set
Top 4%
0.6%