What comes after de novo? Automated lead optimization of proteins with CRADLE-1
Bixby, E.; Brunner, G.; Danciu, D.; Dela Rosa, R.; Deutschmann, N.; Ferragu, C.; Geiger, F.; Holberg, C.; Kidger, P.; Lindoulsi, A.; Lutz, N.; McColgan, T.; Milius, S.; Shah, J.; Vandeloo, M.; Vidas, P.; Ziegler, J. D.; van Rossum, H.; van der Vorm, D.; Baldi, N.; IJSpeert, C.; Monza, E.; Schriek, A.
Show abstract
Lead optimization remains the longest and most expensive step in pre-clinical drug discovery, typically consuming 12-36 months whilst costing $5M-$15M per candidate. We introduce O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP, an automated framework for protein engineering. While O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP supports the full process of drug discovery and industrial protein engineering pipelines, including hit identification and de novo binder design, this work focuses on its application to multi-property lead optimization across protein modalities (VHHs, scFvs, IgGs, peptides, enzymes, CRISPR systems, vaccines). We show it is 4-7x faster than rational design, as measured by the number of wet lab rounds required. We provide in-vitro validation across all of the above modalities, typically optimizing multiple properties simultaneously (single and polyspecific binding down to picomolar, activity, thermostability,...). Technically, O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP starts with pre-trained foundation protein language models (PLMs), which are fine-tuned in unsupervised fashion on evolutionary neighborhoods, in supervised fashion using lab-in-the-loop data, and then deployed in a multi-model workflow. Of additional interest, we find that (a) the end-to-end system may be run in automated fashion; (b) wet lab data may be consumed in black box fashion without knowledge of the underlying biochemical mechanisms; (c) structural data may largely be superseded by sequence-function pairs.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.