Back

Benchmarking and Experimental Validation of Machine Learning Strategies for Enzyme Engineering

Zeng, Z.; Jin, J.; Xu, R.; Luo, X.

2026-03-30 bioengineering
10.64898/2026.03.29.715152 bioRxiv
Show abstract

Enzyme-directed evolution increasingly relies on computational tools to prioritize mutations, yet their practical value is difficult to assess because kinetic data are often aggregated across heterogeneous assay conditions, inflating apparent generalization. Here we introduce EnzyArena, a curated benchmark that groups kinetic parameters (kcat, Km, kcat/Km) into condition-matched experimental subsets to enable realistic evaluation. Using this resource, we benchmark 10 representative models from two arising strategy families--zero-shot fitness prediction and supervised kinetic-parameter prediction--across BRENDA- and SABIO-RK-derived subsets and 25 independent mutagenesis datasets. Kinetic-parameter predictors perform strongly on database-derived subsets but lose their advantage on independent datasets, whereas zero-shot predictors show more consistent generalization. A simple consensus of multiple zero-shot models further improves the precision of identifying beneficial mutants. We prospectively validated these findings in a wet-lab campaign (150 mutants) comparing random mutants, UniKP-prioritized mutants and ESM-1v-prioritized mutants (representing supervised kinetic-parameter prediction and zero-shot fitness prediction, respectively), where ESM-1v achieved the highest utility and UniKP underperformed the random baseline. Together, this study establishes realistic baselines for computational mutant prioritization and highlights consensus zero-shot strategies as a practical starting point for enzyme engineering.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
22.8%
2
Nature Communications
4913 papers in training set
Top 9%
14.9%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 9%
6.9%
4
ACS Catalysis
16 papers in training set
Top 0.1%
6.9%
50% of probability mass above
5
Angewandte Chemie International Edition
81 papers in training set
Top 0.7%
4.4%
6
eLife
5422 papers in training set
Top 24%
3.6%
7
PLOS Computational Biology
1633 papers in training set
Top 12%
2.8%
8
Nucleic Acids Research
1128 papers in training set
Top 7%
2.8%
9
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
2.6%
10
ACS Synthetic Biology
256 papers in training set
Top 1%
2.1%
11
Science
429 papers in training set
Top 13%
1.9%
12
ACS Central Science
66 papers in training set
Top 1%
1.7%
13
Nature Biotechnology
147 papers in training set
Top 5%
1.5%
14
Scientific Reports
3102 papers in training set
Top 62%
1.5%
15
PLOS ONE
4510 papers in training set
Top 58%
1.3%
16
Journal of the American Chemical Society
199 papers in training set
Top 3%
1.3%
17
Nature Methods
336 papers in training set
Top 5%
1.1%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
19
Cell Genomics
162 papers in training set
Top 5%
1.0%
20
Science Advances
1098 papers in training set
Top 25%
1.0%
21
Nature Chemical Biology
104 papers in training set
Top 3%
0.9%
22
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
23
Advanced Science
249 papers in training set
Top 17%
0.8%
24
Nature
575 papers in training set
Top 15%
0.8%
25
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.8%
26
Cell Chemical Biology
81 papers in training set
Top 4%
0.7%
27
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 1%
0.7%
28
International Journal of Molecular Sciences
453 papers in training set
Top 19%
0.5%