Back

Mechanism-informed rules tunably balance novelty and feasibility of predicted enzymatic reactions

Pate, S. C.; Tyo, K. E.; Broadbelt, L. J.

2026-05-19 synthetic biology
10.64898/2026.05.18.726002 bioRxiv
Show abstract

Enzymes catalyze reactions with remarkable specificity and can unlock recalcitrant feedstocks that are dilute, complex, and variable in their constituent molecules. While characterized enzymatic reactions cover a wide range of chemistries, there are an undetermined number of cryptic activities for every known one. These cryptic activities can be elicited through rational design, adaptive laboratory evolution, and increasingly, generative models of proteins. However, prior to tuning a catalyst one must efficiently predict viable novel reactions. In this work we leverage the growing amount of mechanistic enzyme information, specifically the Mechanism and Catalytic Site Atlas, to construct a set of reaction rules that can meet this demand. By explicitly utilizing mechanistic information, the rule sets developed here more accurately identify molecular structures required for catalysis compared to existing curated and heuristically constructed rules. The 899 Distilled rules are constructed directly from characterized mechanisms and cover 62.5% of reactions from Rhea. The Learned rule set is generated from a classifier trained on mechanistic data, allowing full coverage of Rhea and precise identification of mechanism-required atoms (ROC-AUC = 0.98). Additionally, our Learned rules exhibit a more favorable tradeoff between novelty and feasibility and provide users with fine-grained control over this tradeoff. The rules are compatible with all SMARTS-based reaction network expansion and retrosynthesis software.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
25.9%
2
Science
429 papers in training set
Top 1%
14.4%
3
Nature Chemical Biology
104 papers in training set
Top 0.3%
6.3%
4
ACS Synthetic Biology
256 papers in training set
Top 0.7%
4.9%
50% of probability mass above
5
Journal of the American Chemical Society
199 papers in training set
Top 1%
4.3%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 16%
4.3%
7
eLife
5422 papers in training set
Top 22%
4.0%
8
Angewandte Chemie International Edition
81 papers in training set
Top 1.0%
3.6%
9
Chemical Science
71 papers in training set
Top 0.5%
3.1%
10
Nature
575 papers in training set
Top 9%
2.6%
11
Cell
370 papers in training set
Top 9%
2.4%
12
Cell Systems
167 papers in training set
Top 6%
2.1%
13
ACS Catalysis
16 papers in training set
Top 0.1%
1.8%
14
Nature Chemistry
34 papers in training set
Top 0.5%
1.5%
15
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
16
Advanced Science
249 papers in training set
Top 12%
1.5%
17
Nature Methods
336 papers in training set
Top 5%
1.3%
18
ACS Central Science
66 papers in training set
Top 1%
1.3%
19
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
1.0%
20
Nature Biotechnology
147 papers in training set
Top 7%
0.9%
21
Science Advances
1098 papers in training set
Top 27%
0.9%
22
Cell Chemical Biology
81 papers in training set
Top 4%
0.7%
23
iScience
1063 papers in training set
Top 37%
0.6%