Back

Benchmarking Tools for Identification of rRNA Modifications in Escherichia coli using Oxford Nanopore Direct RNA Sequencing

Morampalli, B. R.; Silander, O. K.

2026-04-17 bioinformatics
10.64898/2026.04.15.718756 bioRxiv
Show abstract

RNA modifications are important for RNA structure, stability, and ribosome function, but their identification and localisation remains challenging. Oxford Nanopore direct RNA sequencing (DRS) enables modification-agnostic detection in native RNA, but existing tool benchmarks have focused almost exclusively on m6A in eukaryotic mRNA, leaving multi-modification tool performance in bacterial systems largely untested. Here, we benchmark ten RNA modification detection tools spanning signal-comparison, error-rate, and hybrid approaches on Escherichia coli K-12 MG1655 16S and 23S rRNA, which harbour 11 and 25 known modified sites, respectively, across 17 modification types. Using native RNA and in vitro transcribed (IVT) unmodified RNA, we evaluate performance across 25 coverage levels (5x to 1000x). DiffErr and JACUSA2 showed the strongest discrimination performance (AUROC >0.9 on both 16S and 23S rRNA), with DiffErr achieving the highest F1 score on 16S and JACUSA2 showing the most consistent precision-recall balance across both rRNAs. Both tools achieved full transcript-wide scoring and, along with DRUMMER, exact positional localisation. Several other tools produced no output at many rRNA positions, and restricting evaluation to reported positions inflated apparent performance. Signal-based tools showed a systematic 1-4 nucleotide 5' offset from known modified positions, consistent with the [~]5-mer nucleotide stretch present in the read head of the nanopore; applying tool-specific offset corrections substantially improved per-site recovery and reduced false positives, substantially improving the performance of tools such as EpiNano and nanoDoc. At single-site resolution, no known modified site was recovered by all tools, and several m5C, m5U, and m6A sites were missed by the majority of tools. Tool combination analysis showed that pairing error-rate-based tools with offset-corrected signal-based tools improved site recovery beyond any individual tool, with the best three-tool combination recovering 30 of the 36 known sites while maintaining low false positive rates. These results establish that discrimination metrics (e.g. AUROC) alone are insufficient to evaluate modification detection tools: output completeness, positional precision, and per-modification-type sensitivity should be reported alongside standard benchmarking metrics.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.6%
2
Nucleic Acids Research
1128 papers in training set
Top 2%
10.1%
3
Nature Biotechnology
147 papers in training set
Top 0.9%
8.4%
4
Bioinformatics
1061 papers in training set
Top 4%
6.4%
5
RNA
169 papers in training set
Top 0.1%
4.9%
50% of probability mass above
6
Nature Methods
336 papers in training set
Top 2%
4.3%
7
Genome Biology
555 papers in training set
Top 2%
4.0%
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.6%
3.7%
9
PLOS ONE
4510 papers in training set
Top 41%
3.3%
10
Scientific Reports
3102 papers in training set
Top 47%
2.4%
11
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
12
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
13
Genome Research
409 papers in training set
Top 2%
1.7%
14
RNA Biology
70 papers in training set
Top 0.3%
1.2%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
16
Journal of Molecular Biology
217 papers in training set
Top 2%
1.2%
17
Cell Systems
167 papers in training set
Top 10%
1.1%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.0%
19
BMC Genomics
328 papers in training set
Top 4%
0.9%
20
Cell Genomics
162 papers in training set
Top 5%
0.9%
21
eLife
5422 papers in training set
Top 56%
0.8%
22
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%
25
Genome Medicine
154 papers in training set
Top 9%
0.6%