Back

Manufacturing-Aware Generative Model Architectures Enable Biological Sequence Design and Synthesis at Petascale

Weinstein, E. N.; Gollub, M. G.; Slabodkin, A.; Gardner, C. L.; Dobbs, K.; Cui, X.-B.; Amin, A. N.; Church, G. M.; Wood, E. B.

2024-10-24 bioengineering
10.1101/2024.09.13.612900 bioRxiv
Show abstract

We introduce a method to reduce the cost of synthesizing proteins and other biological sequences designed by a generative model by as much as a trillion-fold. In particular, we make our generative models manufacturing-aware, such that model-designed sequences can be efficiently synthesized in the real world with extreme parallelism. We demonstrate by training and synthesizing samples from generative models of antibodies, T cell antigens and DNA polymerases. For example, we train a manufacturing-aware generative model on 300 million observed human antibodies and synthesize[~] 1017 generated designs from the model, achieving a sample quality comparable to a state-of-the-art protein language model, at a cost of 103 dollars. Using previous methods, synthesis of a library of the same accuracy and size would cost roughly a quadrillion (1015) dollars.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
53.6%
50% of probability mass above
2
Nature Methods
336 papers in training set
Top 0.9%
10.4%
3
Nature Communications
4913 papers in training set
Top 27%
6.6%
4
Science
429 papers in training set
Top 10%
3.2%
5
Genome Research
409 papers in training set
Top 2%
1.8%
6
Nature Biotechnology
147 papers in training set
Top 4%
1.8%
7
Bioinformatics
1061 papers in training set
Top 7%
1.7%
8
Nature
575 papers in training set
Top 12%
1.4%
9
Nature Medicine
117 papers in training set
Top 3%
1.4%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
11
iScience
1063 papers in training set
Top 21%
1.3%
12
PLOS ONE
4510 papers in training set
Top 60%
1.3%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
14
Scientific Reports
3102 papers in training set
Top 68%
1.1%
15
Nature Biomedical Engineering
42 papers in training set
Top 1%
0.9%
16
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
17
Frontiers in Computational Neuroscience
53 papers in training set
Top 2%
0.7%
18
Patterns
70 papers in training set
Top 3%
0.7%
19
Nature Computational Science
50 papers in training set
Top 2%
0.7%
20
Nature Neuroscience
216 papers in training set
Top 7%
0.7%
21
Physical Review Research
46 papers in training set
Top 1.0%
0.7%
22
Science Advances
1098 papers in training set
Top 34%
0.5%
23
Cancer Research
116 papers in training set
Top 4%
0.5%
24
Advanced Science
249 papers in training set
Top 23%
0.5%