Back

CryptKeeper: a negative design tool for reducing unintentional gene expression in bacteria

Roots, C. T.; Barrick, J. E.

2024-09-05 synthetic biology
10.1101/2024.09.05.611466 bioRxiv
Show abstract

Foundational techniques in molecular biology--such as cloning genes, tagging biomolecules for purification or identification, and overexpressing recombinant proteins--rely on introducing non-native or synthetic DNA sequences into organisms. These sequences may be recognized by the transcription and translation machinery in their new context in unintended ways. The cryptic gene expression that sometimes results has been shown to produce genetic instability and mask experimental signals. Computational tools have been developed to predict individual types of gene expression elements, but it can be difficult for researchers to contextualize their collective output. Here, we introduce CryptKeeper, a software pipeline that visualizes predictions of bacterial gene expression signals and estimates the translational burden possible from a DNA sequence. We investigate several published examples where cryptic gene expression in E. coli interfered with experiments. CryptKeeper accurately postdicts unwanted gene expression from both eukaryotic virus infectious clones and individual proteins that led to genetic instability. It also identifies off-target gene expression elements that resulted in truncations that confounded protein purification. Incorporating negative design using CryptKeeper into reverse genetics and synthetic biology workflows can help to mitigate cloning challenges and avoid unexplained failures and complications that arise from unintentional gene expression.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
ACS Synthetic Biology
256 papers in training set
Top 0.1%
33.2%
2
Synthetic Biology
21 papers in training set
Top 0.1%
10.2%
3
Nucleic Acids Research
1128 papers in training set
Top 4%
4.9%
4
Cell Systems
167 papers in training set
Top 2%
4.9%
50% of probability mass above
5
Nature Methods
336 papers in training set
Top 3%
3.6%
6
Molecular Systems Biology
142 papers in training set
Top 0.2%
3.6%
7
Nature Communications
4913 papers in training set
Top 39%
3.6%
8
BMC Bioinformatics
383 papers in training set
Top 3%
3.1%
9
Genome Biology
555 papers in training set
Top 3%
2.8%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
11
Science
429 papers in training set
Top 12%
2.1%
12
PLOS ONE
4510 papers in training set
Top 50%
1.9%
13
Nature
575 papers in training set
Top 10%
1.7%
14
Nature Biotechnology
147 papers in training set
Top 4%
1.7%
15
Bioinformatics
1061 papers in training set
Top 8%
1.5%
16
Cell
370 papers in training set
Top 14%
1.2%
17
GigaScience
172 papers in training set
Top 2%
1.2%
18
Journal of Molecular Biology
217 papers in training set
Top 3%
1.0%
19
mSystems
361 papers in training set
Top 6%
0.9%
20
eLife
5422 papers in training set
Top 53%
0.9%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
22
Protein Science
221 papers in training set
Top 2%
0.7%
23
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
0.7%
24
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%
25
Genome Research
409 papers in training set
Top 5%
0.5%
26
Cell Reports Methods
141 papers in training set
Top 7%
0.5%
27
Nature Computational Science
50 papers in training set
Top 2%
0.5%
28
Structure
175 papers in training set
Top 4%
0.5%