Back

Designing protein/non-protein binding interactions using a full-atom diffusion model

Kundert, K.; Church, G.

2026-02-04 bioinformatics
10.64898/2026.02.02.693502 bioRxiv
Show abstract

An unresolved challenge in the field of computational protein design is to create proteins that bind non-protein partners, e.g. DNA, RNA, and small molecules. Most machine learning (ML) algorithms for protein design can only work with systems composed entirely of amino acids, and therefore cannot be directly applied to this task. The few algorithms that accommodate non-proteins still represent amino acids differently than other molecules, and therefore cannot easily recognize the similarity between a sidechain and a small molecule that share a functional group. We introduce a new method, called AtomPaint, that avoids these limitations by employing a fully-atomic representation of protein structure. Starting from a model of a desired binding interaction, our method proceeds by (i) converting that model to a 3D image, (ii) masking out the parts of that image that need to be redesigned, (iii) using a diffusion model to inpaint the masked voxels, then (iv) using a classification model to identify the amino acids in the inpainted image. Both models are SE(3)-equivariant ResNets, and were trained on a dataset of structures from the Protein Data Bank (PDB) curated to emphasize protein/non-protein interactions. In a sequence recovery benchmark, AtomPaint performed better than random guessing, suggesting that it understands some aspects of molecular structure. We discuss possible avenues of improvement, in the hopes that the advantages of our novel image-based approach can be fully realized.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.3%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
12.1%
3
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
4.2%
4
Bioinformatics Advances
184 papers in training set
Top 0.9%
4.2%
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.2%
6
Journal of Cheminformatics
25 papers in training set
Top 0.2%
3.5%
7
PLOS ONE
4510 papers in training set
Top 43%
3.0%
8
Journal of Structural Biology
58 papers in training set
Top 0.5%
2.6%
50% of probability mass above
9
Journal of Computational Chemistry
11 papers in training set
Top 0.1%
2.6%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 26%
2.3%
11
Nature Communications
4913 papers in training set
Top 47%
2.0%
12
Nature Methods
336 papers in training set
Top 4%
1.9%
13
Cell Systems
167 papers in training set
Top 6%
1.9%
14
Scientific Reports
3102 papers in training set
Top 56%
1.8%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
16
Biophysical Journal
545 papers in training set
Top 3%
1.7%
17
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.7%
18
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.6%
1.5%
19
Nature Biotechnology
147 papers in training set
Top 5%
1.5%
20
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.3%
1.3%
21
Nano Letters
63 papers in training set
Top 2%
1.3%
22
Nature Computational Science
50 papers in training set
Top 1%
1.2%
23
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
24
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
25
Communications Biology
886 papers in training set
Top 19%
0.9%
26
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.9%
27
eLife
5422 papers in training set
Top 56%
0.8%
28
Physical Biology
43 papers in training set
Top 2%
0.8%
29
Protein Science
221 papers in training set
Top 2%
0.7%
30
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%