Identifying a logical specification and a program for an LLM-based generator of lead molecules
Srinivasan, A.; Dash, T.; Baskar, A.; Dey, S. K.; Banerjee, M.
Show abstract
Our interest is in the generation of "lead" molecules in early-stage drug design. Leads are small molecules (ligands) that can bind to a part of pre-specified target and also satisfy multiple physico-chemical constraints. We propose using techniques developed in Inductive Logic Programming (ILP) to identify a logical specification of feasible molecules; and then using this specification to construct a program that uses a large language model (LLM) to generate new molecules. We ensure the program constructed is correct, in the sense that every molecule generated by the program is feasible according the specification. Our focus is on contributing to on-going drug-discovery research on novel inhibitors for Dopamine {beta}-hydroxylase (DBH), an enzyme that plays a pivotal role in several diseases related to the brain and the heart. We find molecules comparable in affinity to the latest generation drugs currently in clinical trials, and chemical assessment of synthesisablity of the molecules generated. For completeness, we also provide results obtained on the classic benchmark datasets used in recent work reported in [1].
Matching journals
The top 7 journals account for 50% of the predicted probability mass.