Back

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Ni, Z.; Li, Y.; Qiu, Z.; Schölkopf, B.; Guo, H.; Liu, W.; Liu, S.

2026-03-04 bioinformatics
10.64898/2026.03.02.708991 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWGenerative models have recently advanced de novo protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce RigidSSL (Rigidity-Aware Self-Supervised Learning), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations. Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions. Underpinning both phases is a bi-directional, rigidity-aware flow matching objective that jointly optimizes translational and rotational dynamics to maximize mutual information between conformations. Empirically, RigidSSL variants improve designability by up to 43% while enhancing novelty and diversity in unconditional generation. Furthermore, RigidSSL-Perturb improves the success rate by 5.8% in zero-shot motif scaffolding and RigidSSL-MD captures more biophysically realistic conformational ensembles in G protein-coupled receptor modeling. The code is available on this repository.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
23.0%
2
Nature Methods
336 papers in training set
Top 2%
6.5%
3
Nature Machine Intelligence
61 papers in training set
Top 0.4%
6.5%
4
Cell Systems
167 papers in training set
Top 2%
6.5%
5
Nature Communications
4913 papers in training set
Top 28%
6.5%
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
4.0%
50% of probability mass above
7
Bioinformatics Advances
184 papers in training set
Top 1%
3.7%
8
Nature Computational Science
50 papers in training set
Top 0.2%
3.1%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 23%
3.1%
10
Nature Biotechnology
147 papers in training set
Top 3%
2.8%
11
Journal of Cheminformatics
25 papers in training set
Top 0.2%
2.4%
12
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
13
Protein Science
221 papers in training set
Top 0.7%
1.9%
14
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 13%
1.4%
16
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
17
Nature
575 papers in training set
Top 13%
1.3%
18
eLife
5422 papers in training set
Top 48%
1.3%
19
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
20
Advanced Science
249 papers in training set
Top 16%
0.9%
21
Scientific Reports
3102 papers in training set
Top 70%
0.9%
22
Communications Biology
886 papers in training set
Top 18%
0.9%
23
Science
429 papers in training set
Top 18%
0.9%
24
Journal of Structural Biology
58 papers in training set
Top 2%
0.8%
25
Genome Research
409 papers in training set
Top 4%
0.8%
26
PLOS ONE
4510 papers in training set
Top 67%
0.8%
27
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
0.7%
28
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
29
Structure
175 papers in training set
Top 3%
0.7%
30
Nature Genetics
240 papers in training set
Top 9%
0.5%