Back

BiLSTM-Powered Bilinear Attention for Protein-Ligand Prediction

Cheng, C.-Y.; Chen, Y.-A.; Li, F.-Y.; Re, S.

2026-05-13 bioinformatics
10.64898/2026.05.10.724184 bioRxiv
Show abstract

Rapid and accurate prediction of protein-ligand bindings is essential for drug discovery. While generative AI has driven rapid advancements in structure-based approaches, sequence-based methods remain significantly faster and more cost-effective. Here, we present a weakly supervised deep learning framework integrating graph convolutional networks (GCN) for molecular encoding and bidirectional long short-term memory (BiLSTM) for protein modeling. The latter represents long-range dependencies better than the widely used convolutional neural network (CNN). Leveraging a bilinear attention network (BAN), this model learns protein-ligand pairwise interactions without requiring three-dimensional structural supervision. By using the publicly available BindingDB dataset, the model was trained, solely on affinity labels, and successfully classified binder and non-binders with AUROC of 0.96 and an AUPRC of 0.95. The model generates interpretable attention maps that serve as a "GPS" to locate binding sites. Remarkably, despite the lack of structural training data, it can pinpoint key contact residues confirmed by crystal structures. Our method could function as a scalable filter for giga-scale libraries, allowing rapid screening of drug candidates with direct structural insights into the protein-ligand interface.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 9%
15.1%
2
Advanced Science
249 papers in training set
Top 1%
10.3%
3
Nature Machine Intelligence
61 papers in training set
Top 0.2%
8.6%
4
Bioinformatics
1061 papers in training set
Top 4%
5.0%
5
Journal of Chemical Information and Modeling
207 papers in training set
Top 1.0%
5.0%
6
Communications Biology
886 papers in training set
Top 2%
3.7%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.7%
50% of probability mass above
8
Nature Methods
336 papers in training set
Top 3%
3.7%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.7%
10
Cell Systems
167 papers in training set
Top 5%
2.7%
11
Nature Biotechnology
147 papers in training set
Top 4%
2.1%
12
Science
429 papers in training set
Top 14%
1.7%
13
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
14
Communications Chemistry
39 papers in training set
Top 0.2%
1.7%
15
Patterns
70 papers in training set
Top 0.9%
1.7%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.4%
17
Scientific Reports
3102 papers in training set
Top 63%
1.4%
18
Cell Research
49 papers in training set
Top 2%
1.3%
19
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
20
Journal of Cheminformatics
25 papers in training set
Top 0.4%
1.1%
21
National Science Review
22 papers in training set
Top 2%
0.9%
22
eLife
5422 papers in training set
Top 53%
0.9%
23
iScience
1063 papers in training set
Top 25%
0.9%
24
Nature Chemical Biology
104 papers in training set
Top 4%
0.8%
25
Chemical Science
71 papers in training set
Top 2%
0.8%
26
Science Advances
1098 papers in training set
Top 29%
0.8%
27
Genome Medicine
154 papers in training set
Top 8%
0.8%
28
Cell Genomics
162 papers in training set
Top 7%
0.7%
29
Nano Letters
63 papers in training set
Top 3%
0.7%
30
Cell Reports Methods
141 papers in training set
Top 6%
0.7%