Back

Protein inverse folding through joint modeling of surface and backbone geometry

hong, y.; cai, y.; jiao, y.; qi, m.; Huang, Q.; Sun, L.

2026-04-22 bioinformatics
10.64898/2026.04.20.719544 bioRxiv
Show abstract

Inverse protein folding aims to generate amino acid sequences compatible with a given protein structure. While recent deep learning methods have achieved strong performance by conditioning on residue-level backbone geometry, backbone-only representations insufficiently constrain surface-exposed residues and thus incompletely capture the structural determinants of sequence identity. Here we propose Surleton, a structure-aware inverse folding framework that jointly models backbone geometry and protein surface organization. By integrating complementary surface geometric information, Surleton refines the conditional sequence distribution and improves the balance of sequence modeling across buried and exposed residues. On the CATH4.2 and SCOPe benchmarks, Surleton consistently outperforms backbone-only baselines in sequence recovery, sequence similarity, and predictive confidence, with especially strong improvements on surface-exposed residues. Together, these findings indicate that protein surface geometry serves as a complementary source of structural constraint and that surface-aware modeling may provide a promising direction for improving inverse protein folding.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
22.8%
2
Bioinformatics
1061 papers in training set
Top 3%
10.6%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 9%
7.3%
4
Cell Systems
167 papers in training set
Top 2%
4.9%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.4%
50% of probability mass above
6
Nature Biotechnology
147 papers in training set
Top 2%
4.0%
7
Nature Machine Intelligence
61 papers in training set
Top 0.8%
4.0%
8
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
9
Nature Methods
336 papers in training set
Top 3%
3.3%
10
Protein Science
221 papers in training set
Top 0.5%
2.6%
11
Scientific Reports
3102 papers in training set
Top 45%
2.6%
12
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
13
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
14
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.9%
15
Communications Biology
886 papers in training set
Top 8%
1.7%
16
Journal of Structural Biology
58 papers in training set
Top 0.8%
1.7%
17
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.6%
1.2%
18
Advanced Science
249 papers in training set
Top 15%
1.0%
19
Structure
175 papers in training set
Top 3%
0.9%
20
eLife
5422 papers in training set
Top 55%
0.8%
21
Nature Computational Science
50 papers in training set
Top 1%
0.8%
22
Journal of Molecular Biology
217 papers in training set
Top 3%
0.8%
23
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
24
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
25
Science
429 papers in training set
Top 20%
0.7%
26
PLOS ONE
4510 papers in training set
Top 71%
0.7%
27
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%
28
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.5%
29
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%
30
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%