Manufacturing-Aware Generative Model Architectures Enable Biological Sequence Design and Synthesis at Petascale
Weinstein, E. N.; Gollub, M. G.; Slabodkin, A.; Gardner, C. L.; Dobbs, K.; Cui, X.-B.; Amin, A. N.; Church, G. M.; Wood, E. B.
Show abstract
We introduce a method to reduce the cost of synthesizing proteins and other biological sequences designed by a generative model by as much as a trillion-fold. In particular, we make our generative models manufacturing-aware, such that model-designed sequences can be efficiently synthesized in the real world with extreme parallelism. We demonstrate by training and synthesizing samples from generative models of antibodies, T cell antigens and DNA polymerases. For example, we train a manufacturing-aware generative model on 300 million observed human antibodies and synthesize[~] 1017 generated designs from the model, achieving a sample quality comparable to a state-of-the-art protein language model, at a cost of 103 dollars. Using previous methods, synthesis of a library of the same accuracy and size would cost roughly a quadrillion (1015) dollars.
Matching journals
The top 1 journal accounts for 50% of the predicted probability mass.