Back

Bridging LLM Reasoning and Chemical Knowledge via an Evolutionary Multi-Agent Framework for Molecular Synthesis

Chen, Y.; Rao, J.; Xie, J.; Sun, Y.; Yang, Y.

2026-05-06 bioinformatics
10.64898/2026.05.02.722342 bioRxiv
Show abstract

MotivationMolecular design faces the dual challenge of navigating a vast chemical space while ensuring experimental synthesizability. Traditional models are constrained by small datasets, restricting their scalability and broader chemical context. In contrast, Large Language Models (LLMs) encapsulate extensive synthesis protocols derived from vast scientific literature, yet they struggle to leverage this potential due to severe hallucinations and a superficial grasp of rigorous chemical logic. ResultsWe propose EvoSyn, an evolutionary multi-agent framework that synergizes LLM reasoning with domain experts for preference-aware molecular synthesis. EvoSyn orchestrates a dual-process evolutionary paradigm: a co-evolving process that collaboratively aligns linguistic capabilities with multi-objective constraints, and a self-evolving process formulated as a Markov Game. Through evolution and reinforcement learning, agents actively learn from mistakes, utilizing domain feedback to penalize invalid proposals and ground generation in feasible reaction pathways. Extensive evaluations on comprehensive benchmarks demonstrate that EvoSyn significantly outperforms state-of-the-art baselines. These results highlight that by integrating LLM-guided self-evolution with rigorous domain validation to mitigate hallucinations, EvoSyn effectively yields molecules that are both bioactive and synthetically actionable. Availability and implementationImplementation code is available as supplementary material. Contactyangyd25@mail.sysu.edu.cn Supplementary informationSupplementary data are available at Bioinformatics online.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
23.0%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
19.0%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
14.6%
50% of probability mass above
4
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.4%
5
Nature Communications
4913 papers in training set
Top 39%
3.7%
6
Advanced Science
249 papers in training set
Top 6%
3.1%
7
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
8
iScience
1063 papers in training set
Top 12%
1.8%
9
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
10
ACS Synthetic Biology
256 papers in training set
Top 2%
1.7%
11
Journal of Cheminformatics
25 papers in training set
Top 0.3%
1.7%
12
PLOS ONE
4510 papers in training set
Top 56%
1.5%
13
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
14
Nature Machine Intelligence
61 papers in training set
Top 3%
1.2%
15
Scientific Reports
3102 papers in training set
Top 70%
0.9%
16
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.7%
0.9%
17
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.8%
18
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
19
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 9%
0.7%
20
International Journal of Molecular Sciences
453 papers in training set
Top 17%
0.7%
21
Communications Chemistry
39 papers in training set
Top 1%
0.7%
22
Chemical Science
71 papers in training set
Top 3%
0.5%
23
Cell Systems
167 papers in training set
Top 15%
0.5%