Designing protein/non-protein binding interactions using a full-atom diffusion model
Kundert, K.; Church, G.
Show abstract
An unresolved challenge in the field of computational protein design is to create proteins that bind non-protein partners, e.g. DNA, RNA, and small molecules. Most machine learning (ML) algorithms for protein design can only work with systems composed entirely of amino acids, and therefore cannot be directly applied to this task. The few algorithms that accommodate non-proteins still represent amino acids differently than other molecules, and therefore cannot easily recognize the similarity between a sidechain and a small molecule that share a functional group. We introduce a new method, called AtomPaint, that avoids these limitations by employing a fully-atomic representation of protein structure. Starting from a model of a desired binding interaction, our method proceeds by (i) converting that model to a 3D image, (ii) masking out the parts of that image that need to be redesigned, (iii) using a diffusion model to inpaint the masked voxels, then (iv) using a classification model to identify the amino acids in the inpainted image. Both models are SE(3)-equivariant ResNets, and were trained on a dataset of structures from the Protein Data Bank (PDB) curated to emphasize protein/non-protein interactions. In a sequence recovery benchmark, AtomPaint performed better than random guessing, suggesting that it understands some aspects of molecular structure. We discuss possible avenues of improvement, in the hopes that the advantages of our novel image-based approach can be fully realized.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.