Back

BioBrigit, A Hybrid Deep Learning and Knowledge-based Approach to Model Metal Pathways in Proteins: Application to a Di-Copper Tyrosinase

Marechal, J.-D.; Sodupe, M.; Sanchez, J. E.; Fernandez Diaz, R.; Roldan Martin, L.

2024-09-22 bioinformatics
10.1101/2024.09.19.613875 bioRxiv
Show abstract

The interaction of metallic species with proteins has been fundamental in evolution and key in many physiological processes. How metals bind to proteins also holds promise in many fields, like the design of new biocatalysts or the fight against pathogens. Nonetheless, uncovering the mechanism under which proteins recruit metal ions is far from understood and is one of the challenges in bioinorganic chemistry and structural biology. Computational methods are potentially among the most promising tools for this endeavor. Only a handful of efficient structural predictors of metal binding sites exist to date. Most focus on identifying the most stable binding sites in the protein scaffolds. Although these methods are very interesting, they do not consider the exploration of transient, sub-optimal binding sites that could be relevant in metal binding pathways in proteins. At the far end of modeling capabilities nowadays, we introduce BioBrigit, a hybrid Deep Learning - knowledge-based approach that suggests metal binding pathways in proteins. To demonstrate the methods viability, we apply it to the di-copper tyrosinase from Streptomyces castaneoglobisporus, a system for which crystallographic experiments allowed the identification of a series of transient sites of the copper in its path from a chaperone to the final catalytic site. Combined with homology modeling and large-scale molecular dynamics, BioBrigit allows for computational characterization of all experimental sites and for better understanding of the copper recruitment mechanism. BioBrigit appears as an asset in a field full of unknowns like metal binding to proteins and opens the way to further algorithms in this area. Source code, documentation, and data are available at https://github.com/insilichem/BioBrigit

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.4%
14.6%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
12.7%
3
Bioinformatics
1061 papers in training set
Top 3%
10.6%
4
PLOS Computational Biology
1633 papers in training set
Top 4%
7.3%
5
Protein Science
221 papers in training set
Top 0.2%
4.9%
50% of probability mass above
6
Nature Communications
4913 papers in training set
Top 32%
4.9%
7
Chemical Science
71 papers in training set
Top 0.3%
3.7%
8
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.4%
2.6%
9
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.3%
2.1%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
11
Structure
175 papers in training set
Top 1%
1.9%
12
Journal of Molecular Biology
217 papers in training set
Top 1%
1.9%
13
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.9%
14
Scientific Reports
3102 papers in training set
Top 55%
1.8%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
16
Communications Chemistry
39 papers in training set
Top 0.4%
1.4%
17
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
18
Communications Biology
886 papers in training set
Top 14%
1.2%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.2%
20
Nucleic Acids Research
1128 papers in training set
Top 14%
1.1%
21
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.0%
22
Journal of Computational Chemistry
11 papers in training set
Top 0.1%
1.0%
23
Patterns
70 papers in training set
Top 2%
0.9%
24
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
25
PeerJ
261 papers in training set
Top 16%
0.7%
26
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
27
eLife
5422 papers in training set
Top 61%
0.7%
28
Cell Reports Methods
141 papers in training set
Top 6%
0.7%
29
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%
30
Cell Systems
167 papers in training set
Top 14%
0.5%