Back

FrameDiPT: SE(3) Diffusion Model for Protein Structure Inpainting

Zhang, C.; Leach, A.; Makkink, T.; Arbesu, M.; Kadri, I.; Luo, D.; Mizrahi, L.; Krichen, S.; Lang, M.; Tovchigrechko, A.; Lopez Carranza, N.; Sahin, U.; Beguir, K.; Rooney, M.; Fu, Y.

2024-01-20 immunology
10.1101/2023.11.21.568057 bioRxiv
Show abstract

Protein structure prediction field has been revolutionised by deep learning with protein folding models such as AlphaFold 2 and ESMFold. These models enable rapid in silico prediction and have been integrated into de novo protein design and protein-protein interaction (PPI) prediction. However, biologically relevant features dependent on conformational distributions cannot be estimated with these models. Diffusion models, a novel class of generative models, have been developed to learn conformational distributions and applied to de novo protein design. Limited work has been done on protein structure inpainting, where a masked section is recovered by simultaneously conditioning on its sequence and the rest of the structure. In this work, we propose FrameDiff inPainTing (FrameDiPT), a generalised model for protein inpainting. This is important for T-cells given the hyper-variability of the complementarity determining region (CDR) loops. We evaluated the model on CDR loop design for T-cell receptors and achieved comparable prediction accuracy to ProteinGenerator and RFdiffusion with limited training data and learnable parameters. Different from deterministic structure prediction models, FrameDiPT captures the conformational distribution at different regions and binding states, highlighting a key advantage of generative models. The model and inference code have been released1.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.2%
54.7%
50% of probability mass above
2
Nature Computational Science
50 papers in training set
Top 0.1%
11.0%
3
Nature Communications
4913 papers in training set
Top 20%
9.6%
4
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.8%
5
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
6
Nature Methods
336 papers in training set
Top 4%
1.6%
7
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
8
Communications Biology
886 papers in training set
Top 13%
1.3%
9
Bioinformatics Advances
184 papers in training set
Top 4%
1.0%
10
Cell Systems
167 papers in training set
Top 10%
1.0%
11
Scientific Reports
3102 papers in training set
Top 70%
0.9%
12
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
14
eLife
5422 papers in training set
Top 57%
0.8%
15
Journal of Cheminformatics
25 papers in training set
Top 0.6%
0.7%
16
Structure
175 papers in training set
Top 4%
0.7%
17
Advanced Science
249 papers in training set
Top 21%
0.7%
18
iScience
1063 papers in training set
Top 36%
0.7%
19
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
20
National Science Review
22 papers in training set
Top 3%
0.5%
21
Journal of Molecular Biology
217 papers in training set
Top 5%
0.5%
22
Protein Science
221 papers in training set
Top 2%
0.5%
23
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%
24
Cell Reports Methods
141 papers in training set
Top 6%
0.5%