Back

Generative design of sequence specific DNA binding proteins

Sehgal, E.; Politanska, Y.; Mitra, R.; Kim, P. T.; Gonzalez Rodriguez, N.; Warrier, T.; Kubaney, A.; Morishita, A.; Quijano, R.; Butcher, J.; Krishna, R.; Pecoraro, R.; Belmont, B.; Roullier, N.; Goreshnik, I.; Vafeados, D. K.; Kwon, P.; Ramarao, R.; Taipale, J.; Glasscock, C. J.; Baker, D.

2026-04-27 synthetic biology
10.64898/2026.04.27.720408 bioRxiv
Show abstract

De novo protein design has advanced rapidly in recent years, yet the programmable recognition of specific DNA sequences remains a longstanding challenge. Here we describe a deep learning based approach for designing sequence selective DNA binding proteins. Our method combines structure generation using RFdiffusion3 with explicit screening against off-target interactions using AlphaFold3. We test this approach by generating 96 designs for each of 15 diverse DNA targets and identify specific binders for 7 targets, representing a ~100-fold improvement in success rates over previous approaches. We further characterize the binding landscape using variant competition assays and randomized library screening, revealing robust sequence discrimination across diverse targets. Together, these results represent a significant step forward in de novo sequence specific DNA binder design.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
26.2%
2
Cell Systems
167 papers in training set
Top 0.8%
12.5%
3
Nucleic Acids Research
1128 papers in training set
Top 2%
8.5%
4
Science
429 papers in training set
Top 4%
6.9%
50% of probability mass above
5
Nature Biotechnology
147 papers in training set
Top 1%
6.4%
6
ACS Synthetic Biology
256 papers in training set
Top 0.7%
4.9%
7
Nature Methods
336 papers in training set
Top 2%
4.4%
8
Nature
575 papers in training set
Top 8%
3.1%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
1.9%
10
Advanced Science
249 papers in training set
Top 11%
1.7%
11
Genome Biology
555 papers in training set
Top 5%
1.5%
12
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
13
Communications Biology
886 papers in training set
Top 14%
1.2%
14
Cell
370 papers in training set
Top 14%
1.2%
15
Nature Computational Science
50 papers in training set
Top 1%
1.0%
16
Journal of the American Chemical Society
199 papers in training set
Top 4%
0.9%
17
Synthetic Biology
21 papers in training set
Top 0.1%
0.8%
18
Scientific Reports
3102 papers in training set
Top 72%
0.8%
19
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
20
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
21
Cell Genomics
162 papers in training set
Top 7%
0.7%
22
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
23
Bioinformatics
1061 papers in training set
Top 10%
0.7%
24
Nano Letters
63 papers in training set
Top 4%
0.5%
25
Science Advances
1098 papers in training set
Top 35%
0.5%
26
iScience
1063 papers in training set
Top 40%
0.5%
27
Angewandte Chemie International Edition
81 papers in training set
Top 4%
0.5%
28
Chemical Science
71 papers in training set
Top 3%
0.5%