Back

A deep learning predictor of bindable protein surfaces toguide generative synthetic biology

Almeida-Souza, L.

2026-04-16 synthetic biology
10.64898/2026.04.16.718848 bioRxiv
Show abstract

The advent of generative machine learning models has revolutionized de novo design of protein binders. However, the wide adoption of this revolution is bottlenecked by computational cost. For many targets, binder design commonly requires computationally intensive sampling across structures, often wasting days of GPU time on unwanted or geometrically inviable regions. Here, IARA (Interface Analysis and Recognition Architecture) is introduced, a deep learning Graph Neural Network designed as a rapid structural filter to triage protein binder generative pipelines. IARA is trained entirely on BindCraft trajectories generated against s RFdiffusion-generated targets. Based on a slim network with only seven residue features, IARA maps the binder designability of input proteins in a matter of seconds. On validation runs using BindCraft, RFdiffusion and BoltzGen, IARA successfully identified the optimal binding interface for practically all targets. By instantly pinpointing the highest-probability binding pockets, IARA democratizes synthetic biology, drastically reducing the exploratory GPU compute required for successful de novo binder generation.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Science
429 papers in training set
Top 0.1%
41.1%
2
Nature Communications
4913 papers in training set
Top 13%
13.0%
50% of probability mass above
3
Nature
575 papers in training set
Top 3%
10.9%
4
Nature Methods
336 papers in training set
Top 2%
5.1%
5
Nature Biotechnology
147 papers in training set
Top 2%
4.5%
6
Cell Systems
167 papers in training set
Top 3%
3.7%
7
Cell
370 papers in training set
Top 12%
1.5%
8
Nature Machine Intelligence
61 papers in training set
Top 2%
1.5%
9
Nature Chemical Biology
104 papers in training set
Top 3%
1.2%
10
Nature Materials
21 papers in training set
Top 0.7%
0.9%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
12
Nature Computational Science
50 papers in training set
Top 1%
0.9%
13
Neuron
282 papers in training set
Top 7%
0.9%
14
Science Advances
1098 papers in training set
Top 27%
0.8%
15
Advanced Science
249 papers in training set
Top 18%
0.8%
16
Communications Biology
886 papers in training set
Top 22%
0.8%
17
Structure
175 papers in training set
Top 4%
0.7%
18
ACS Synthetic Biology
256 papers in training set
Top 4%
0.5%
19
ACS Nano
99 papers in training set
Top 5%
0.5%
20
Patterns
70 papers in training set
Top 3%
0.5%
21
Nature Structural & Molecular Biology
218 papers in training set
Top 6%
0.5%
22
Chemical Science
71 papers in training set
Top 3%
0.5%
23
Nucleic Acids Research
1128 papers in training set
Top 20%
0.5%