Back

Identifying a logical specification and a program for an LLM-based generator of lead molecules

Srinivasan, A.; Dash, T.; Baskar, A.; Dey, S. K.; Banerjee, M.

2025-02-16 bioinformatics
10.1101/2025.02.14.634875 bioRxiv
Show abstract

Our interest is in the generation of "lead" molecules in early-stage drug design. Leads are small molecules (ligands) that can bind to a part of pre-specified target and also satisfy multiple physico-chemical constraints. We propose using techniques developed in Inductive Logic Programming (ILP) to identify a logical specification of feasible molecules; and then using this specification to construct a program that uses a large language model (LLM) to generate new molecules. We ensure the program constructed is correct, in the sense that every molecule generated by the program is feasible according the specification. Our focus is on contributing to on-going drug-discovery research on novel inhibitors for Dopamine {beta}-hydroxylase (DBH), an enzyme that plays a pivotal role in several diseases related to the brain and the heart. We find molecules comparable in affinity to the latest generation drugs currently in clinical trials, and chemical assessment of synthesisablity of the molecules generated. For completeness, we also provide results obtained on the classic benchmark datasets used in recent work reported in [1].

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 13%
14.4%
2
Bioinformatics
1061 papers in training set
Top 3%
10.2%
3
BMC Bioinformatics
383 papers in training set
Top 1%
8.5%
4
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
6.4%
5
Scientific Reports
3102 papers in training set
Top 23%
4.9%
6
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
3.7%
50% of probability mass above
8
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
9
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.9%
10
Molecules
37 papers in training set
Top 0.6%
1.9%
11
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.8%
12
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.7%
13
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
14
Journal of Cheminformatics
25 papers in training set
Top 0.3%
1.7%
15
Journal of Computational Biology
37 papers in training set
Top 0.2%
1.5%
16
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.3%
17
npj Systems Biology and Applications
99 papers in training set
Top 2%
1.2%
18
Frontiers in Genetics
197 papers in training set
Top 7%
1.1%
19
iScience
1063 papers in training set
Top 24%
1.0%
20
IEEE Access
31 papers in training set
Top 0.8%
0.9%
21
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
22
Pharmaceutics
21 papers in training set
Top 0.4%
0.8%
23
Physical Biology
43 papers in training set
Top 2%
0.8%
24
Genes
126 papers in training set
Top 3%
0.8%
25
BMC Genomics
328 papers in training set
Top 6%
0.8%
26
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
27
BioData Mining
15 papers in training set
Top 0.9%
0.8%
28
Biosystems
18 papers in training set
Top 0.5%
0.7%
29
Frontiers in Pharmacology
100 papers in training set
Top 5%
0.7%
30
Pharmaceuticals
33 papers in training set
Top 2%
0.6%