Repurposing the dark genome. II - Reverse Proteins

Nayak, S.; Dhar, P. K.

2023-03-21 synthetic biology

10.1101/2023.03.20.533367 bioRxiv

Show abstract

Based on the expression blueprint encoded in the genome, three groups of sequences have been identified - protein encoding, RNA encoding, and non-expressing. We asked: Why did nature choose a particular DNA sequence for expression? Did she sample every possibility, approving some for RNA synthesis, some for protein synthesis, and retiring/ignoring the rest. If evolution randomly selected sequences for metabolic trials, how much non-utilized (not-expressing) and under-utilized (only RNA encoding) information is currently available for innovations? These questions lead us to experimentally synthesizing functional proteins from intergenic sequences of E.coli (Dhar et al 2009). The current work is an extension of this original report and takes into consideration natural protein-coding sequences read backward to generate a new possibility. Reverse proteins are full-length translation equivalents of the existing protein-coding genes read in the -1 frame. The structural, functional and interaction predictions of reverse proteins in E.coli, S.cerevisiae and D.melanogaster, open up a new opportunity of producing first-in-the-class proteins towards functional endpoints. This study points to a large untapped genomic space from the fundamental biology and applications perspectives.

Matching journals

●Non-profit ◐University press ○Commercial

The top 9 journals account for 50% of the predicted probability mass.

Only show non-profit

○ 18 papers in training set

Genome Biology and Evolution

◐ 280 papers in training set

Journal of Molecular Evolution

○ 21 papers in training set

G3: Genes, Genomes, Genetics

◐ 222 papers in training set

● 169 papers in training set

International Journal of Molecular Sciences

○ 453 papers in training set

PLOS Computational Biology

● 1633 papers in training set

● 5422 papers in training set

Computational and Structural Biotechnology Journal

● 216 papers in training set

50% of probability mass above

ACS Synthetic Biology

● 256 papers in training set

Nucleic Acids Research

◐ 1128 papers in training set

Frontiers in Bioengineering and Biotechnology

○ 88 papers in training set

◐ 261 papers in training set

○ 1063 papers in training set

Protein Science

○ 221 papers in training set

Molecular Biology and Evolution

◐ 488 papers in training set

Proceedings of the National Academy of Sciences

● 2130 papers in training set

Frontiers in Genetics

○ 197 papers in training set

Frontiers in Molecular Biosciences

○ 100 papers in training set

Journal of Molecular Biology

○ 217 papers in training set

● 95 papers in training set

Metabolic Engineering

○ 68 papers in training set

○ 167 papers in training set

◐ 189 papers in training set

○ 370 papers in training set

Nature Communications

○ 4913 papers in training set

Philosophical Transactions of the Royal Society B

● 51 papers in training set

Scientific Reports

○ 3102 papers in training set

Journal of The Royal Society Interface

● 189 papers in training set

● 4510 papers in training set