Back

Residue burial encodes a protein's fold

Grigas, A. T.; Sumner, J.; O'Hern, C. S.

2026-03-31 biophysics
10.64898/2026.03.28.714986 bioRxiv
Show abstract

Protein structure is controlled by a high-dimensional energy landscape, which is a function of all of the atomic coordinates of the protein. Can this landscape be accurately described by a low-dimensional representation? We find that residue core identity, a binary N-dimensional encoding indicating whether each of the N amino acids in a protein is buried in the core or not, can predict the proteins backbone conformation more efficiently than all other representations that we tested. Core identity is 4 times more efficient than previous estimates of the bits per residue needed to encode a proteins native fold, 2 times more efficient than the C contact map, and 1.5 times more efficient than the machine-learned embeddings from FoldSeeks 3Di. Even when the folded structure is unavailable, predicting each residues burial from sequence yields a more accurate estimate of fold quality than predicting pairwise contacts from the same sequence information. Thus, this work emphasizes that the problem of determining a proteins native fold can be re-framed as predicting each residues core identity.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 1%
18.9%
2
Nature Communications
4913 papers in training set
Top 9%
14.9%
3
Cell Systems
167 papers in training set
Top 1.0%
10.2%
4
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
5
Scientific Reports
3102 papers in training set
Top 23%
4.9%
50% of probability mass above
6
Nature Methods
336 papers in training set
Top 2%
4.4%
7
Nature Computational Science
50 papers in training set
Top 0.2%
3.1%
8
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.3%
2.1%
9
Science
429 papers in training set
Top 12%
2.1%
10
Communications Biology
886 papers in training set
Top 11%
1.5%
11
Biophysical Journal
545 papers in training set
Top 3%
1.5%
12
Structure
175 papers in training set
Top 2%
1.3%
13
PLOS ONE
4510 papers in training set
Top 58%
1.3%
14
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
15
Physical Review X
23 papers in training set
Top 0.3%
1.2%
16
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.2%
17
Nucleic Acids Research
1128 papers in training set
Top 13%
1.2%
18
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.2%
19
Nature
575 papers in training set
Top 13%
1.2%
20
IUCrJ
29 papers in training set
Top 0.3%
1.1%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.1%
22
PRX Life
34 papers in training set
Top 0.6%
1.0%
23
Bioinformatics
1061 papers in training set
Top 8%
1.0%
24
Entropy
20 papers in training set
Top 0.3%
1.0%
25
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
26
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.8%
27
Physical Biology
43 papers in training set
Top 2%
0.8%
28
Physical Review Research
46 papers in training set
Top 1%
0.7%
29
Physical Review Letters
43 papers in training set
Top 0.7%
0.7%
30
iScience
1063 papers in training set
Top 37%
0.7%