Back

A revised genome annotation of the model cyanobacterium Synechocystis based on start and stop codon-enriched ribosome profiling and proteogenomics

Hadjeras, L.; Krauspe, V.; Gelhausen, R.; Heiniger, B.; Spaet, P.; Reimann, V.; Jaques, G.; Minges, P.; Bilger, R.; Gerstner, M.; Macek, B.; Ahrens, C. H.; Backofen, R.; Sharma, C. M.; Hess, W. R.

2026-01-19 microbiology
10.64898/2025.12.20.695498 bioRxiv
Show abstract

Cyanobacteria are important primary producers and are used in biotechnology as microbial cell factories due to their ability to use solar light for oxygenic photosynthesis. Synechocystis sp. PCC 6803 is a popular model cyanobacterium, yet there are ambiguities in the precise coding regions of many genes, and numerous genes encoding small proteins have remained undetected. Here we present the results of a ribosome profiling (Ribo-seq) analysis involving inhibitors that stall ribosomes at translation initiation and termination sites (TIS- and TTS-Ribo-seq), combined with a proteogenomic reevaluation and reannotation of its entire genome. We report evidence for the translation of 3,050 annotated genes based on proteogenomics (83%), of 3,492 based on Ribo-seq (95.2%), and of 3,009 supported by both methods (82%). The data suggested both novel protein-coding genes and corrections for annotated ones. We validated 15 novel small proteins translated from antisense RNAs, from intergenic and intragenic regions and identified 69 novel, mostly small proteins based on proteogenomics. With slr0489, slr1079 and slr1082 we identified three genes with [~]300 nt long intragenic out-of-frame coding regions and show that both the internal and host reading frames are translated. The resulting proteins interact with each other, resembling certain defense or toxin-antitoxin systems. Our data illustrate the enormous value of consolidating genome annotations in the context of integrated experimental data and suggest that genome annotations in general need to be extended and revised. All of our data can be accessed via an intuitive and interactive genome browser platform at https://www.bioinf.uni-freiburg.de/~ribobase/.

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 5%
10.4%
2
Nature Communications
4913 papers in training set
Top 21%
8.6%
3
Photosynthesis Research
15 papers in training set
Top 0.1%
5.0%
4
Nucleic Acids Research
1128 papers in training set
Top 4%
5.0%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.7%
6
Frontiers in Microbiology
375 papers in training set
Top 2%
3.7%
7
Microbiology Resource Announcements
22 papers in training set
Top 0.1%
2.8%
8
ACS Synthetic Biology
256 papers in training set
Top 1%
2.8%
9
Communications Biology
886 papers in training set
Top 3%
2.8%
10
Algal Research
20 papers in training set
Top 0.1%
2.5%
11
ISME Communications
103 papers in training set
Top 0.8%
2.5%
12
PLOS ONE
4510 papers in training set
Top 47%
2.1%
50% of probability mass above
13
Metabolic Engineering
68 papers in training set
Top 0.3%
2.1%
14
mBio
750 papers in training set
Top 6%
2.1%
15
New Phytologist
309 papers in training set
Top 3%
1.7%
16
mSystems
361 papers in training set
Top 5%
1.7%
17
RNA Biology
70 papers in training set
Top 0.2%
1.7%
18
Microorganisms
101 papers in training set
Top 1%
1.4%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.4%
20
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
21
Genome Biology and Evolution
280 papers in training set
Top 1%
1.3%
22
The Plant Cell
141 papers in training set
Top 2%
0.9%
23
EMBO reports
136 papers in training set
Top 5%
0.9%
24
Life Science Alliance
263 papers in training set
Top 1.0%
0.9%
25
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
26
Peer Community Journal
254 papers in training set
Top 3%
0.8%
27
RNA
169 papers in training set
Top 0.4%
0.8%
28
Genes
126 papers in training set
Top 3%
0.8%
29
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
30
Biology Direct
10 papers in training set
Top 0.1%
0.8%