Back

EnzySeek: Efficient Exploration of Enzyme Reaction Pathways Using AI Agents

Kang, X.; Yu, T.; Xu, K.; Liu, C.; Wu, R.

2026-03-02 biochemistry
10.64898/2026.03.02.708939 bioRxiv
Show abstract

With the rapid development of Large Language Models (LLMs) and Agent technologies, AI can assist in solving a variety of real-world problems across multiple domains, such as autonomous driving, drug discovery, and materials design. In this work, we present EnzySeek, an enzyme catalysis AI agent designed to assist researchers in enzyme catalysis simulations. First, we constructed a domain-specific knowledge base by curating thousands of papers related to enzyme catalysis. Second, we customized Model Context Protocol (MCP) interfaces for each step of the enzyme catalysis simulation workflow, enabling these functions to be invoked by LLMs. Finally, we configured an agent capable of simultaneously referencing past empirical studies on enzyme catalysis, autonomously executing tool calls, and analyzing as well as presenting the results. EnzySeeks capabilities cover multiple aspects, including protein structure prediction, molecular docking, system preparation and parameterization, molecular dynamics (MD) simulations, and QM/MM calculations. The conclusions drawn by EnzySeek are primarily based on the results of QM/MM calculations. We employed the semi-empirical quantum mechanical method GFN2-xTB to calculate the QM region of the system. Benchmark results indicate that the GFN2-xTB method can achieve high efficiency while maintaining accuracy. The EnzySeek agent is designed to continuously learn from newly published literature and past computational tasks. During its operation, every AI decision is manually verified and scored by human experts. This human-in-the-loop validation provides the AI with sufficient case-based support, ultimately contributing to the full automation of enzyme catalysis computations. All data generated during the simulations are compiled into a dataset, which is used to establish evaluation criteria specific to enzyme catalysis computational results.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
33.4%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
10.2%
3
Bioinformatics
1061 papers in training set
Top 4%
6.4%
50% of probability mass above
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.2%
5
Nucleic Acids Research
1128 papers in training set
Top 5%
4.0%
6
Journal of Molecular Biology
217 papers in training set
Top 0.4%
4.0%
7
PLOS ONE
4510 papers in training set
Top 38%
3.6%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.6%
9
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
2.1%
10
Nature Communications
4913 papers in training set
Top 48%
1.9%
11
SoftwareX
15 papers in training set
Top 0.1%
1.9%
12
Protein Science
221 papers in training set
Top 0.7%
1.8%
13
ACS Omega
90 papers in training set
Top 2%
1.7%
14
Scientific Reports
3102 papers in training set
Top 57%
1.7%
15
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.5%
1.7%
16
eLife
5422 papers in training set
Top 51%
1.0%
17
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.7%
1.0%
18
The Journal of Physical Chemistry Letters
58 papers in training set
Top 1%
0.9%
19
IUCrJ
29 papers in training set
Top 0.3%
0.8%
20
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
21
Chemical Science
71 papers in training set
Top 2%
0.8%
22
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
23
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.4%
0.7%
24
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.7%
25
Journal of Structural Biology
58 papers in training set
Top 2%
0.5%
26
iScience
1063 papers in training set
Top 40%
0.5%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 49%
0.5%
28
Communications Biology
886 papers in training set
Top 32%
0.5%