Back

What comes after de novo? Automated lead optimization of proteins with CRADLE-1

Bixby, E.; Brunner, G.; Danciu, D.; Dela Rosa, R.; Deutschmann, N.; Ferragu, C.; Geiger, F.; Holberg, C.; Kidger, P.; Lindoulsi, A.; Lutz, N.; McColgan, T.; Milius, S.; Shah, J.; Vandeloo, M.; Vidas, P.; Ziegler, J. D.; van Rossum, H.; van der Vorm, D.; Baldi, N.; IJSpeert, C.; Monza, E.; Schriek, A.

2026-03-08 bioengineering
10.64898/2026.03.06.710001 bioRxiv
Show abstract

Lead optimization remains the longest and most expensive step in pre-clinical drug discovery, typically consuming 12-36 months whilst costing $5M-$15M per candidate. We introduce O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP, an automated framework for protein engineering. While O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP supports the full process of drug discovery and industrial protein engineering pipelines, including hit identification and de novo binder design, this work focuses on its application to multi-property lead optimization across protein modalities (VHHs, scFvs, IgGs, peptides, enzymes, CRISPR systems, vaccines). We show it is 4-7x faster than rational design, as measured by the number of wet lab rounds required. We provide in-vitro validation across all of the above modalities, typically optimizing multiple properties simultaneously (single and polyspecific binding down to picomolar, activity, thermostability,...). Technically, O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP starts with pre-trained foundation protein language models (PLMs), which are fine-tuned in unsupervised fashion on evolutionary neighborhoods, in supervised fashion using lab-in-the-loop data, and then deployed in a multi-model workflow. Of additional interest, we find that (a) the end-to-end system may be run in automated fashion; (b) wet lab data may be consumed in black box fashion without knowledge of the underlying biochemical mechanisms; (c) structural data may largely be superseded by sequence-function pairs.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
33.4%
2
Nature Methods
336 papers in training set
Top 0.7%
12.9%
3
Nature Biotechnology
147 papers in training set
Top 1%
6.4%
50% of probability mass above
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 13%
4.9%
5
Nature Communications
4913 papers in training set
Top 35%
4.4%
6
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
3.1%
7
Science
429 papers in training set
Top 12%
2.1%
8
Bioinformatics
1061 papers in training set
Top 7%
1.9%
9
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.9%
10
Protein Science
221 papers in training set
Top 0.8%
1.7%
11
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
12
Chemical Science
71 papers in training set
Top 0.9%
1.7%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
14
Scientific Reports
3102 papers in training set
Top 62%
1.5%
15
Advanced Science
249 papers in training set
Top 13%
1.3%
16
PLOS ONE
4510 papers in training set
Top 60%
1.2%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.0%
18
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
19
iScience
1063 papers in training set
Top 26%
0.9%
20
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.9%
21
ACS Synthetic Biology
256 papers in training set
Top 2%
0.9%
22
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
23
Angewandte Chemie International Edition
81 papers in training set
Top 3%
0.8%
24
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.9%
0.8%
25
Patterns
70 papers in training set
Top 3%
0.7%
26
Communications Biology
886 papers in training set
Top 28%
0.7%
27
Nature Medicine
117 papers in training set
Top 6%
0.7%