Back

RiboBA: a bias-aware probabilistic framework for robust ORF identification across diverse ribosome profiling protocols

BAI, J.; Yang, R.

2026-03-19 bioinformatics
10.64898/2026.03.17.712439 bioRxiv
Show abstract

By mapping ribosome-protected fragments (RPFs) genome-wide, ribosome profiling (Ribo-seq) has uncovered extensive translation beyond conventional coding sequences, revealing non-canonical ORFs (ncORFs) with emerging roles in diverse biological processes. However, protocol-induced biases introduced during library construction can substantially distort RPF signals. Most existing ORF callers are not designed to explicitly account for such artifacts, limiting robust ncORF identification. Here, we present RiboBA, a bias-aware probabilistic framework to address this challenge. RiboBA consists of two main components: a generative module that recovers protocol-induced biases and codon-level ribosome occupancy, and a supervised module that identifies translated ORFs and initiation sites using the resulting bias-adjusted profiles. Evaluated through simulations and on a range of Ribo-seq datasets--particularly supported by cell-type-specific immunopeptidomics--RiboBA robustly recovers protocol-induced parameters and achieves superior accuracy and sensitivity in ncORF identification. Notably, RiboBA performs particularly well on RNase I libraries with attenuated three-nucleotide periodicity, as well as on MNase and nuclease P1 libraries, while maintaining competitive runtimes. In a Drosophila case study, RiboBA identifies conserved ncORFs with coding potential, including recurrent upstream translation of ThrRS and Mettl2 that suggests a potential threonine-specific translational control axis. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=89 SRC="FIGDIR/small/712439v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@1ee4f67org.highwire.dtl.DTLVardef@9f11eeorg.highwire.dtl.DTLVardef@1522de9org.highwire.dtl.DTLVardef@443d7f_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.5%
12.0%
2
Nature Methods
336 papers in training set
Top 0.8%
12.0%
3
Cell Systems
167 papers in training set
Top 1%
9.8%
4
Genome Research
409 papers in training set
Top 0.2%
8.9%
5
Bioinformatics
1061 papers in training set
Top 3%
8.2%
50% of probability mass above
6
Genome Biology
555 papers in training set
Top 0.9%
6.6%
7
Nucleic Acids Research
1128 papers in training set
Top 3%
6.6%
8
Nature Communications
4913 papers in training set
Top 34%
4.7%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.8%
10
PLOS Computational Biology
1633 papers in training set
Top 11%
3.0%
11
Nature
575 papers in training set
Top 10%
1.8%
12
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
13
Cell Genomics
162 papers in training set
Top 3%
1.7%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.6%
15
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.3%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
18
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
19
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
20
iScience
1063 papers in training set
Top 34%
0.7%
21
Science
429 papers in training set
Top 20%
0.7%
22
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%