Back

Repurposing the dark genome. II - Reverse Proteins

Nayak, S.; Dhar, P. K.

2023-03-21 synthetic biology
10.1101/2023.03.20.533367 bioRxiv
Show abstract

Based on the expression blueprint encoded in the genome, three groups of sequences have been identified - protein encoding, RNA encoding, and non-expressing. We asked: Why did nature choose a particular DNA sequence for expression? Did she sample every possibility, approving some for RNA synthesis, some for protein synthesis, and retiring/ignoring the rest. If evolution randomly selected sequences for metabolic trials, how much non-utilized (not-expressing) and under-utilized (only RNA encoding) information is currently available for innovations? These questions lead us to experimentally synthesizing functional proteins from intergenic sequences of E.coli (Dhar et al 2009). The current work is an extension of this original report and takes into consideration natural protein-coding sequences read backward to generate a new possibility. Reverse proteins are full-length translation equivalents of the existing protein-coding genes read in the -1 frame. The structural, functional and interaction predictions of reverse proteins in E.coli, S.cerevisiae and D.melanogaster, open up a new opportunity of producing first-in-the-class proteins towards functional endpoints. This study points to a large untapped genomic space from the fundamental biology and applications perspectives.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Biosystems
18 papers in training set
Top 0.1%
12.5%
2
Genome Biology and Evolution
280 papers in training set
Top 0.1%
10.1%
3
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
6.4%
4
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.1%
6.4%
5
RNA
169 papers in training set
Top 0.1%
4.0%
6
International Journal of Molecular Sciences
453 papers in training set
Top 2%
4.0%
7
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
8
eLife
5422 papers in training set
Top 29%
3.1%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.7%
50% of probability mass above
10
ACS Synthetic Biology
256 papers in training set
Top 1%
2.5%
11
Nucleic Acids Research
1128 papers in training set
Top 8%
2.4%
12
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 1.0%
2.1%
13
PeerJ
261 papers in training set
Top 6%
1.9%
14
iScience
1063 papers in training set
Top 12%
1.9%
15
Protein Science
221 papers in training set
Top 0.7%
1.9%
16
Molecular Biology and Evolution
488 papers in training set
Top 2%
1.9%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.8%
18
Frontiers in Genetics
197 papers in training set
Top 4%
1.8%
19
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.8%
20
Journal of Molecular Biology
217 papers in training set
Top 1%
1.8%
21
Open Biology
95 papers in training set
Top 0.6%
1.7%
22
Metabolic Engineering
68 papers in training set
Top 0.5%
1.3%
23
Cell Systems
167 papers in training set
Top 10%
1.0%
24
GENETICS
189 papers in training set
Top 1%
0.9%
25
Cell
370 papers in training set
Top 16%
0.8%
26
Nature Communications
4913 papers in training set
Top 61%
0.8%
27
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 5%
0.8%
28
Scientific Reports
3102 papers in training set
Top 73%
0.8%
29
Journal of The Royal Society Interface
189 papers in training set
Top 5%
0.7%
30
PLOS ONE
4510 papers in training set
Top 68%
0.7%