Back

Can a Sparse 29 x 29 Pixel Chaos Game Representation Predict Protein Binding Sites using Fine-Tuned State-of-the Art Deep Learning Semantic Segmentation Models?

Dick, K.; Green, J. R.

2023-08-04 bioinformatics
10.1101/2023.08.04.410498 bioRxiv
Show abstract

No. While our experiments ultimately failed, this work was motivated by the seemingly reasonable hypothesis that encoding protein sequences as a fractal-based image in combination with a binary mask identifying those pixels representative of the protein binding interface could effectively be used to fine-tune a semantic segmentation model. We were wrong. Despite the shortcomings of this work, a number of insights were drawn, inspiring discussion about how this fractal-based space may be exploited to generate effective protein binding site predictors in the future. Furthermore, these realizations promise to orient complimentary studies leveraging fractal-based representations, whether in the field of bioinformatics, or more broadly within disparate fields leveraging sequence-type data, such as Natural Language Processing. In a non-traditional way, this work presents the experimental design undertaken and interleaves various insights and limitations. It is the hope of this work that those interested in leveraging fractal-based representations and deep learning architectures as part of their work will benefit from the insights arising from this work.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
9.1%
2
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.3%
8.3%
3
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.9%
6.3%
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.8%
5
Bioinformatics
1061 papers in training set
Top 5%
3.9%
6
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
3.9%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.9%
8
GigaScience
172 papers in training set
Top 0.5%
3.6%
9
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
10
PLOS ONE
4510 papers in training set
Top 43%
2.9%
50% of probability mass above
11
Bioinformatics Advances
184 papers in training set
Top 2%
2.9%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.7%
13
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.6%
14
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.6%
15
Frontiers in Genetics
197 papers in training set
Top 4%
2.1%
16
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
2.1%
17
Scientific Reports
3102 papers in training set
Top 51%
2.1%
18
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.7%
19
PeerJ
261 papers in training set
Top 8%
1.6%
20
BioData Mining
15 papers in training set
Top 0.5%
1.2%
21
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.1%
22
Molecules
37 papers in training set
Top 1%
1.1%
23
Expert Systems with Applications
11 papers in training set
Top 0.3%
0.9%
24
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
25
IEEE Access
31 papers in training set
Top 0.9%
0.8%
26
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 1.0%
0.7%
27
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.3%
0.7%
28
Journal of Computational Biology
37 papers in training set
Top 0.6%
0.7%
29
Journal of Cheminformatics
25 papers in training set
Top 0.6%
0.7%
30
ImmunoInformatics
11 papers in training set
Top 0.2%
0.7%