Back

DeepREAL: A Deep Learning Powered Multi-scale Modeling Framework Towards Predicting Out-of-distribution Receptor Activity of Ligand Binding

Cai, T.; Abbu, K. A.; Liu, Y.; Xie, L.

2021-09-15 bioinformatics
10.1101/2021.09.12.460001 bioRxiv
Show abstract

Drug discovery has witnessed intensive exploration of the problem of drug-target physical interactions over two decades, however, a strong drug binding affinity to a single target often fails to translate into desired clinical outcomes. A critical knowledge gap needs to be filled for correlating drug-target interactions with phenotypic responses: predicting the receptor activities or function selectivity upon the ligand binding (i.e., agonist vs. antagonist) on a genome-scale and for novel chemicals. Two major obstacles compound the difficulty on this direction: known data of receptor activity is far too scarce to train a robust model in light of genome-scale applications, and real-world applications need to deploy a model on data from various shifted distributions. To address these challenges, we have developed an end-to-end deep learning framework, DeepREAL, for multi-scale modeling of genome-wide receptor activities of ligand binding. DeepREAL utilizes self-supervised learning on tens of millions of protein sequences and pre-trained binary interaction classification to solve the data distribution shift and data scarcity problems. Extensive benchmark studies that simulate real-world scenarios demonstrate that DeepREAL achieves state-of-the-art performance in out-of-distribution settings.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
23.2%
2
Nature Communications
4913 papers in training set
Top 20%
9.4%
3
Cell Systems
167 papers in training set
Top 2%
7.0%
4
Briefings in Bioinformatics
326 papers in training set
Top 1%
5.0%
5
Advanced Science
249 papers in training set
Top 5%
3.7%
6
Bioinformatics
1061 papers in training set
Top 6%
3.2%
50% of probability mass above
7
Nature Methods
336 papers in training set
Top 3%
2.8%
8
Patterns
70 papers in training set
Top 0.4%
2.7%
9
Scientific Reports
3102 papers in training set
Top 49%
2.1%
10
Genome Medicine
154 papers in training set
Top 4%
1.9%
11
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
12
Cell Genomics
162 papers in training set
Top 3%
1.9%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.7%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.7%
15
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.2%
1.7%
16
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.5%
17
Communications Biology
886 papers in training set
Top 13%
1.3%
18
Journal of Cheminformatics
25 papers in training set
Top 0.4%
1.3%
19
Science Advances
1098 papers in training set
Top 24%
1.1%
20
eLife
5422 papers in training set
Top 51%
1.0%
21
Nature Biomedical Engineering
42 papers in training set
Top 1%
0.9%
22
Nature Biotechnology
147 papers in training set
Top 6%
0.9%
23
Genome Research
409 papers in training set
Top 4%
0.8%
24
Genome Biology
555 papers in training set
Top 7%
0.8%
25
Quantitative Biology
11 papers in training set
Top 0.6%
0.8%
26
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
28
PLOS ONE
4510 papers in training set
Top 67%
0.8%
29
Cell Research
49 papers in training set
Top 2%
0.8%
30
Science Bulletin
22 papers in training set
Top 0.9%
0.7%