Protein inverse folding through joint modeling of surface and backbone geometry

hong, y.; cai, y.; jiao, y.; qi, m.; Huang, Q.; Sun, L.

2026-04-22 bioinformatics

10.64898/2026.04.20.719544 bioRxiv

Show abstract

Inverse protein folding aims to generate amino acid sequences compatible with a given protein structure. While recent deep learning methods have achieved strong performance by conditioning on residue-level backbone geometry, backbone-only representations insufficiently constrain surface-exposed residues and thus incompletely capture the structural determinants of sequence identity. Here we propose Surleton, a structure-aware inverse folding framework that jointly models backbone geometry and protein surface organization. By integrating complementary surface geometric information, Surleton refines the conditional sequence distribution and improves the balance of sequence modeling across buried and exposed residues. On the CATH4.2 and SCOPe benchmarks, Surleton consistently outperforms backbone-only baselines in sequence recovery, sequence similarity, and predictive confidence, with especially strong improvements on surface-exposed residues. Together, these findings indicate that protein surface geometry serves as a complementary source of structural constraint and that surface-aware modeling may provide a promising direction for improving inverse protein folding.

Matching journals

●Non-profit ◐University press ○Commercial

The top 5 journals account for 50% of the predicted probability mass.

Only show non-profit

Nature Communications

○ 4913 papers in training set

◐ 1061 papers in training set

Proceedings of the National Academy of Sciences

● 2130 papers in training set

○ 167 papers in training set

Briefings in Bioinformatics

◐ 326 papers in training set

50% of probability mass above

Nature Biotechnology

○ 147 papers in training set

Nature Machine Intelligence

○ 61 papers in training set

PLOS Computational Biology

● 1633 papers in training set

○ 336 papers in training set

Protein Science

○ 221 papers in training set

Scientific Reports

○ 3102 papers in training set

Bioinformatics Advances

◐ 184 papers in training set

Nucleic Acids Research

◐ 1128 papers in training set

Journal of Chemical Information and Modeling

● 207 papers in training set

Communications Biology

○ 886 papers in training set

Journal of Structural Biology

○ 58 papers in training set

Proteins: Structure, Function, and Bioinformatics

○ 82 papers in training set

Advanced Science

○ 249 papers in training set

○ 175 papers in training set

● 5422 papers in training set

Nature Computational Science

○ 50 papers in training set

Journal of Molecular Biology

○ 217 papers in training set

Cell Reports Methods

○ 141 papers in training set

Computational and Structural Biotechnology Journal

● 216 papers in training set

● 429 papers in training set

● 4510 papers in training set

ACS Synthetic Biology

● 256 papers in training set

The Journal of Physical Chemistry B

● 158 papers in training set

NAR Genomics and Bioinformatics

◐ 214 papers in training set

BMC Bioinformatics

○ 383 papers in training set