Back

A Statistical Detector for Ribosomal Frameshifts and Dual Encodings based on Ribosome Profiling

Yurovsky, A.; Gardin, J.; Futcher, B.; Skiena, S.

2022-06-06 bioinformatics
10.1101/2022.06.06.495024 bioRxiv
Show abstract

During protein synthesis, the ribosome shifts along the messenger RNA (mRNA) by exactly three nucleotides for each amino acid added to the protein being translated. However, in special cases, the sequence of the mRNA somehow induces the ribosome to shift forward by either two or four nucleotides. This shifts the "reading frame" in which the mRNA is translated, and gives rise to an otherwise unexpected protein. Such "programmed frameshifts" are well-known in viruses, including coronavirus, and a few cases of programmed frameshifting are also known in cellular genes. However, there is no good way, either experimental or informatic, to identify novel cases of programmed frameshifting. Thus it is possible that substantial numbers of cellular proteins generated by programmed frameshifting in human and other organisms remain unknown. Here, we build on prior work observing that data from ribosome profiling can be analyzed for anomalies in mRNA reading frame periodicity to identify putative programmed frameshifts. We develop a statistical framework to identify all likely (even for very low frameshifting rates) frameshift positions in a genome. We also develop a frameshift simulator for ribosome profiling data to verify our algorithm. We show high sensitivity of prediction on the simulated data, retrieving 97.4% of the simulated frameshifts. Furthermore, our method found all three of the known yeast genes with programmed frameshifts. We list several hundred yeast genes that may contain +1 or -1 frameshifts. Our results suggest there could be a large number of un-annotated alternative proteins in the yeast genome generated by programmed frameshifting. This motivates further study and parallel investigations in the human genome. Frameshift Detector algorithms and instructions can be accessed in Github: https://github.com/ayurovsky/Frame-Shift-Detector.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.1%
23.5%
2
PLOS Computational Biology
1633 papers in training set
Top 2%
12.9%
3
Bioinformatics
1061 papers in training set
Top 4%
6.6%
4
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
5.1%
5
PLOS ONE
4510 papers in training set
Top 37%
3.7%
50% of probability mass above
6
Journal of Molecular Biology
217 papers in training set
Top 0.7%
3.2%
7
Frontiers in Genetics
197 papers in training set
Top 3%
2.9%
8
Scientific Reports
3102 papers in training set
Top 43%
2.9%
9
Nucleic Acids Research
1128 papers in training set
Top 7%
2.7%
10
BioData Mining
15 papers in training set
Top 0.2%
1.9%
11
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
12
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.8%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
14
Biosystems
18 papers in training set
Top 0.2%
1.5%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.4%
16
iScience
1063 papers in training set
Top 21%
1.3%
17
Journal of Computational Biology
37 papers in training set
Top 0.4%
1.0%
18
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.4%
0.9%
19
PLOS Genetics
756 papers in training set
Top 12%
0.9%
20
Frontiers in Physiology
93 papers in training set
Top 5%
0.9%
21
PeerJ
261 papers in training set
Top 12%
0.9%
22
Biology
43 papers in training set
Top 2%
0.8%
23
RNA
169 papers in training set
Top 0.4%
0.8%
24
Database
51 papers in training set
Top 0.9%
0.8%
25
F1000Research
79 papers in training set
Top 5%
0.7%
26
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.5%
27
Genome Research
409 papers in training set
Top 5%
0.5%
28
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%
29
GigaScience
172 papers in training set
Top 4%
0.5%
30
Physical Biology
43 papers in training set
Top 3%
0.5%