Back

Optimization of PURE system composition using automation and active learning

Bernard-Lapeyre, Y.; Cleij, C.; Sakai, A.; Huguet, M.-J.; Danelon, C.

2026-03-25 synthetic biology
10.64898/2026.03.23.713685 bioRxiv
Show abstract

Protein synthesis using recombinant elements (PURE) system has been widely applied in various biological research fields and synthetic cell construction. Optimization efforts to enhance the PURE system performance by adjusting its individual components have remained limited to the expression of single genes with a small number of molecular compositions tested, making it difficult to link component composition to system-level performance across different DNA contexts. Here, we combine automated acoustic liquid handling with an active learning framework to explore broadly the compositional landscape of PURE system. By grouping the 69 individual components (including proteins and tRNAs) into 21 functional sets and iteratively guiding experiments with active learning, we rapidly identify improved compositions and demonstrated up to 3-fold enhancement in protein yield and translation rate for a single reporter gene. We further show that optimization drivers differ between low and high DNA concentrations, revealing that optimal PURE compositions are DNA concentration-dependent. We then apply this optimization strategy to enhance the expression of a 41-kb synthetic chromosome containing 15 genes by maximizing the fluorescence intensities of two reporter proteins. While a 3-fold improvement could be reached on the two gene products guiding learning, a full proteomic analysis revealed that optimization is gene-specific, i.e., changes in PURE system compositions differently impact the amounts of synthesized proteins encoded on the same DNA template. Together, this work establishes active learning as an efficient strategy to navigate the high-dimensional PURE compositional space and provides mechanistic insight into DNA context-dependence of gene expression optimization.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
ACS Synthetic Biology
256 papers in training set
Top 0.3%
18.1%
2
Nature Communications
4913 papers in training set
Top 7%
18.1%
3
Nucleic Acids Research
1128 papers in training set
Top 2%
8.2%
4
Cell Systems
167 papers in training set
Top 1%
8.2%
50% of probability mass above
5
Advanced Science
249 papers in training set
Top 4%
4.7%
6
Nature Chemical Biology
104 papers in training set
Top 1.0%
2.8%
7
Journal of the American Chemical Society
199 papers in training set
Top 2%
2.8%
8
Angewandte Chemie International Edition
81 papers in training set
Top 1%
2.4%
9
Synthetic Biology
21 papers in training set
Top 0.1%
2.0%
10
Nature Biotechnology
147 papers in training set
Top 4%
2.0%
11
ACS Nano
99 papers in training set
Top 2%
1.8%
12
Metabolic Engineering
68 papers in training set
Top 0.4%
1.7%
13
Nano Letters
63 papers in training set
Top 2%
1.7%
14
Science Advances
1098 papers in training set
Top 18%
1.7%
15
ACS Central Science
66 papers in training set
Top 1%
1.7%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 33%
1.7%
17
eLife
5422 papers in training set
Top 46%
1.4%
18
Communications Chemistry
39 papers in training set
Top 0.5%
1.2%
19
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
20
Nature Methods
336 papers in training set
Top 6%
0.9%
21
Cell
370 papers in training set
Top 16%
0.8%
22
iScience
1063 papers in training set
Top 34%
0.7%
23
Communications Biology
886 papers in training set
Top 27%
0.7%
24
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.6%
25
Nature Machine Intelligence
61 papers in training set
Top 4%
0.6%