Back

Evidence for Numerous Embedded Antisense Overlapping Genes in Diverse E. coli Strains

Zehentner, B.; Ardern, Z.; Kreitmeier, M.; Scherer, S.; Neuhaus, K.

2020-11-18 microbiology
10.1101/2020.11.18.388249 bioRxiv
Show abstract

The genetic code allows six reading frames at a double-stranded DNA locus, and many open reading frames (ORFs) overlap extensively with ORFs of annotated genes (e.g., at least 30 bp or having an embedded ORF). Currently, bacterial genome annotation systematically discards embedded overlapping ORFs of genes (OLGs) due to an assumed information-content constraint, and, consequently, very few OLGs are known. Here we use strand-specific RNAseq and ribosome profiling, detecting about 200 embedded or partially overlapping ORFs of gene candidates in the pathogen E. coli O157:H7 EDL933. These are typically short, many of them show clear promoter motifs as determined by Cappable-seq, indistinguishable from those of annotated genes, and are expressed at a low level. We could express most of them as stable proteins, and 49 displayed a potential phenotype. Ribosome profiling analyses in three other E. coli strains predicted between 84 and 190 embedded antisense OLGs per strain except in E. coli K-12, which is an atypical lab strain. We also found evidence of homology to annotated genes for 100 to 300 OLGs per E. coli strain investigated. Based on this evidence we suggest that bacterial OLGs deserve attention with respect to genome annotation and coding complexity of bacterial genomes. Such sequences may constitute an important coding reserve, opening up new research in genetics and evolutionary biology.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.2%
10.7%
2
Frontiers in Microbiology
375 papers in training set
Top 0.6%
9.4%
3
mSystems
361 papers in training set
Top 0.8%
9.4%
4
Nucleic Acids Research
1128 papers in training set
Top 2%
8.6%
5
Scientific Reports
3102 papers in training set
Top 16%
6.5%
6
BMC Genomics
328 papers in training set
Top 0.8%
3.7%
7
mBio
750 papers in training set
Top 6%
2.5%
50% of probability mass above
8
mSphere
281 papers in training set
Top 2%
2.5%
9
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 2%
2.4%
10
PLOS ONE
4510 papers in training set
Top 46%
2.4%
11
Nature Communications
4913 papers in training set
Top 46%
2.1%
12
MicrobiologyOpen
18 papers in training set
Top 0.1%
2.1%
13
Environmental Microbiology
119 papers in training set
Top 2%
1.5%
14
microLife
19 papers in training set
Top 0.1%
1.5%
15
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
16
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.0%
17
RNA Biology
70 papers in training set
Top 0.4%
1.0%
18
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.0%
19
International Journal of Molecular Sciences
453 papers in training set
Top 12%
0.9%
20
Microbial Biotechnology
29 papers in training set
Top 0.7%
0.9%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.9%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
23
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.9%
24
Genome Research
409 papers in training set
Top 4%
0.8%
25
Gut Microbes
70 papers in training set
Top 0.9%
0.8%
26
Microbiology Spectrum
435 papers in training set
Top 5%
0.8%
27
eLife
5422 papers in training set
Top 55%
0.8%
28
Life Science Alliance
263 papers in training set
Top 1%
0.8%
29
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
30
Microbiology
57 papers in training set
Top 1%
0.8%