Back

Random, de novo and conserved proteins: How structure and disorder predictors perform differently

Middendorf, L.; Eicholt, L. A.

2023-07-19 bioinformatics
10.1101/2023.07.18.549582 bioRxiv
Show abstract

Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model (pLM)-based predictor ESMFold for de novo, random, and conserved proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe varying predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
14.1%
2
Protein Science
221 papers in training set
Top 0.1%
8.3%
3
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.7%
8.1%
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.4%
6.7%
5
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
6.3%
6
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
6.2%
7
Bioinformatics
1061 papers in training set
Top 5%
4.2%
50% of probability mass above
8
Scientific Reports
3102 papers in training set
Top 28%
4.2%
9
The Journal of Physical Chemistry B
158 papers in training set
Top 0.5%
3.6%
10
Biophysical Journal
545 papers in training set
Top 2%
3.5%
11
Journal of Structural Biology
58 papers in training set
Top 0.4%
3.5%
12
Journal of Molecular Biology
217 papers in training set
Top 1%
2.0%
13
PLOS ONE
4510 papers in training set
Top 49%
2.0%
14
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.5%
1.9%
15
International Journal of Molecular Sciences
453 papers in training set
Top 8%
1.7%
16
PeerJ
261 papers in training set
Top 9%
1.5%
17
International Journal of Biological Macromolecules
65 papers in training set
Top 3%
0.9%
18
ACS Omega
90 papers in training set
Top 3%
0.9%
19
Structure
175 papers in training set
Top 3%
0.9%
20
Current Research in Structural Biology
11 papers in training set
Top 0.1%
0.9%
21
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
22
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
23
Frontiers in Bioinformatics
45 papers in training set
Top 1.0%
0.7%
24
eLife
5422 papers in training set
Top 60%
0.7%
25
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
26
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%
27
Communications Chemistry
39 papers in training set
Top 2%
0.6%