Back

Research on protein structure prediction and folding based on novel remote homologs recognition

Zhao, K.; Xia, Y.; Zhang, F.; Zhou, X.; Li, S. Z.; Zhang, G.

2022-10-20 bioinformatics
10.1101/2022.10.16.512404 bioRxiv
Show abstract

Recognition of remote homologous structures is a necessary module in AlphaFold2 and is also essential for the exploration of protein folding pathways. Here, we developed a new method, PAthreader, which identifies remote homologous structures based on the three-track alignment of distance profiles and structure profiles originated from PDB and AlphaFold DB by deep learning. Based on the identified templates, we further enhanced state-of-the-art modelling method and explored protein folding pathways based on the residue frequency distribution of homologs and the secondary structure. The results show that the average accuracy of templates identified by PAthreader is 11.6% higher than those of HHsearch on 551 nonredundant proteins. In terms of structure modelling, PAthreader improves the performance of AlphaFold2 and ranks first in CAMEO blind test for the last three months. Furthermore, we explored protein folding pathways for 37 proteins. The results are almost consistent with biological experiments for 7 proteins, and the remaining 30 human proteins have yet to be verified by biological experiments, revealing that folding information can be exploited from remote homologous structures.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Briefings in Bioinformatics
326 papers in training set
Top 0.1%
28.2%
2
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.5%
10.3%
3
Bioinformatics
1061 papers in training set
Top 3%
7.3%
4
Nature Communications
4913 papers in training set
Top 35%
4.4%
50% of probability mass above
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
6
BMC Bioinformatics
383 papers in training set
Top 3%
2.7%
7
Scientific Reports
3102 papers in training set
Top 46%
2.5%
8
Journal of Genetics and Genomics
36 papers in training set
Top 0.6%
2.4%
9
Communications Biology
886 papers in training set
Top 5%
2.1%
10
Journal of Structural Biology
58 papers in training set
Top 0.6%
1.8%
11
Journal of Molecular Biology
217 papers in training set
Top 2%
1.7%
12
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.7%
13
PLOS ONE
4510 papers in training set
Top 56%
1.5%
14
Nucleic Acids Research
1128 papers in training set
Top 13%
1.2%
15
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.2%
16
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.2%
17
Nature Machine Intelligence
61 papers in training set
Top 3%
1.1%
18
Computational Biology and Chemistry
23 papers in training set
Top 0.3%
1.1%
19
Advanced Science
249 papers in training set
Top 15%
1.0%
20
Science Bulletin
22 papers in training set
Top 0.6%
1.0%
21
Quantitative Biology
11 papers in training set
Top 0.5%
1.0%
22
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
23
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
24
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.6%
0.7%
25
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 1%
0.7%
26
Science China Life Sciences
26 papers in training set
Top 2%
0.7%
27
National Science Review
22 papers in training set
Top 3%
0.7%
28
Database
51 papers in training set
Top 1%
0.7%
29
Cell Research
49 papers in training set
Top 3%
0.7%
30
Protein Science
221 papers in training set
Top 2%
0.7%