Back

ProteomeScan: A Toolkit For Target Validation By Proteome-Wide Docking And Analysis

Barsainyan, A. A.; Panda, R.; Siguenza, J.; Merico, D.; Ramsundar, B.

2026-04-16 bioinformatics
10.64898/2026.04.14.718479 bioRxiv
Show abstract

The problem of identifying which protein target a potential drug-like molecule interacts with is crucial for both the study of existing drugs and the design of new therapeutic compounds. Despite the importance of target identification, existing computational approaches remain limited in terms of speed, accuracy, and protein target coverage. We introduce ProteomeScan, a large-scale, gene-driven computational toolkit for systematic proteome-wide scanning to uncover hidden or previously uncharacterized protein-ligand interactions. ProteomeScan leverages cloud-scale high performance computing to perform extensive molecular docking simulations across the human proteome to rank candidate targets based on binding affinities. After filtering promiscuous targets, we found that ProteomeScan ranks known target significantly better than a random baseline for a set of control compounds. Furthermore, we performed physical analyses of predicted binding modes for both promiscuous and known protein-ligand binding pairs to validate that ProteomeScan identifies interactions with valid binding pockets. In addition, we conducted experiments using mutant variants of proteins to study how mutations affect binding behavior. We have open sourced the core ProteomeScan algorithm as part of the DeepChem ecosystem to enhance transparency and reproducibility. Author summary

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.6%
33.2%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.2%
22.7%
50% of probability mass above
3
Bioinformatics Advances
184 papers in training set
Top 0.4%
6.4%
4
Journal of Cheminformatics
25 papers in training set
Top 0.1%
6.4%
5
PLOS Computational Biology
1633 papers in training set
Top 12%
2.8%
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.4%
7
Protein Science
221 papers in training set
Top 0.8%
1.8%
8
Journal of Molecular Biology
217 papers in training set
Top 1%
1.8%
9
PLOS ONE
4510 papers in training set
Top 53%
1.7%
10
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
11
Patterns
70 papers in training set
Top 1%
1.5%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
13
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.1%
14
Scientific Reports
3102 papers in training set
Top 70%
0.9%
15
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.8%
16
Nature Communications
4913 papers in training set
Top 64%
0.7%
17
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.9%
0.6%
18
Journal of Proteome Research
215 papers in training set
Top 2%
0.6%
19
iScience
1063 papers in training set
Top 37%
0.6%
20
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%
21
Cell Systems
167 papers in training set
Top 15%
0.5%