Back

AutoLead: An LLM-Guided Bayesian Optimization Framework for Multi-Objective Lead Optimization

Zhang, Y.; Choong, J. j.; Ozawa, K.

2025-08-23 bioinformatics
10.1101/2025.08.19.671029 bioRxiv
Show abstract

The process of lead optimization in drug discovery is a complex, multi-objective challenge that remains a major bottleneck in the development of new therapeutics. Traditional approaches often struggle to efficiently explore the vast chemical space while simultaneously optimizing multiple, and sometimes conflicting, molecular properties. In this work, we present AutoLead, a novel framework that integrates Large Language Models (LLMs) with multi-objective Bayesian optimization to tackle this challenge. By leveraging the chemical reasoning capabilities of LLMs, AutoLead effectively guides the search for novel drug-like molecules that satisfy multiple objectives. We evaluate our approach on two molecule optimization tasks, achieving state-of-the-art results. Furthermore, we introduce a new benchmark dataset designed around a more realistic lead optimization scenario, where the task is to modify compounds that violate Lipinskis Rule of Five to simultaneously meet all criteria and improve their QED score. Through extensive experiments and a detailed case study, we demonstrate the potential of combining LLMs with black-box optimization techniques for more efficient and practical drug discovery.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.2%
22.6%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
14.8%
3
Bioinformatics
1061 papers in training set
Top 3%
10.5%
4
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.9%
50% of probability mass above
5
Bioinformatics Advances
184 papers in training set
Top 1%
4.0%
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.0%
7
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
8
Scientific Reports
3102 papers in training set
Top 36%
3.6%
9
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
10
PLOS ONE
4510 papers in training set
Top 42%
3.1%
11
Nature Communications
4913 papers in training set
Top 51%
1.7%
12
iScience
1063 papers in training set
Top 18%
1.5%
13
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.5%
14
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
1.2%
15
Nucleic Acids Research
1128 papers in training set
Top 13%
1.2%
16
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.1%
17
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
18
Communications Biology
886 papers in training set
Top 21%
0.8%
19
Advanced Science
249 papers in training set
Top 17%
0.8%
20
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.7%
21
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
22
Cell Systems
167 papers in training set
Top 13%
0.7%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
24
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
25
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.8%
0.5%
26
Nature Machine Intelligence
61 papers in training set
Top 4%
0.5%
27
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.5%