Back

De novo protein fold families expand the designable ligand binding site space

Pan, X.; Kortemme, T.

2021-01-15 biophysics
10.1101/2021.01.13.426598 bioRxiv
Show abstract

A major challenge in designing proteins de novo to bind user-defined ligands with high specificity and affinity is finding backbones structures that can accommodate a desired binding site geometry with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions. Author summaryDe novo design of proteins that can bind to novel and highly diverse user-defined small molecule ligands could have broad biomedical and synthetic biology applications. Because ligand binding site geometries need to be accommodated by protein backbone scaffolds at high accuracy, the diversity of scaffolds is a major limitation for designing new ligand binding functions. Advances in computational protein structure design methods have significantly increased the number of accessible stable scaffold structures. Understanding how many new ligand binding sites can be accommodated by the de novo scaffolds is important for designing novel ligand binding proteins. To answer this question, we constructed a large library of ligand binding sites from the Protein Data Bank (PDB). We tested the number of ligand binding sites that can be accommodated by de novo scaffolds and naturally existing scaffolds with same fold topologies. The results showed that de novo scaffolds significantly expanded the ligand binding space of their respective fold topologies. We also identified factors that affect difficulties of binding site accommodation, as well as the relationship between the number of scaffolds and the accessible ligand binding site space. We believe our findings will benefit future method development and applications of ligand binding protein design.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 1%
18.5%
2
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
14.3%
3
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.4%
12.6%
4
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.2%
6.3%
50% of probability mass above
5
Protein Science
221 papers in training set
Top 0.2%
6.3%
6
Biophysical Journal
545 papers in training set
Top 1%
4.8%
7
The Journal of Physical Chemistry B
158 papers in training set
Top 0.5%
3.7%
8
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.6%
2.9%
9
PLOS ONE
4510 papers in training set
Top 45%
2.6%
10
Structure
175 papers in training set
Top 1%
2.3%
11
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.2%
2.1%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
Physical Biology
43 papers in training set
Top 1%
1.7%
14
Scientific Reports
3102 papers in training set
Top 64%
1.3%
15
IUCrJ
29 papers in training set
Top 0.2%
1.2%
16
ACS Omega
90 papers in training set
Top 3%
0.9%
17
Nature Communications
4913 papers in training set
Top 63%
0.7%
18
Communications Chemistry
39 papers in training set
Top 1%
0.7%
19
Journal of Cheminformatics
25 papers in training set
Top 0.6%
0.7%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 46%
0.7%
21
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
22
Journal of Molecular Biology
217 papers in training set
Top 4%
0.6%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%