Back

RdRpCATCH: A unified resource for RNA virus discovery using viral RNA-dependent RNA polymerase profile Hidden Markov models

Karapliafis, D.; Neri, U.; Olendraite, I.; Charon, J.; Sakaguchi, S.; Hou, X.; de Ridder, D.; Zwart, M. P.; Kupczok, A.

2026-02-06 bioinformatics
10.64898/2026.02.05.703936 bioRxiv
Show abstract

Recent advances in metatranscriptomics and large-scale mining of publicly available sequencing datasets have substantially expanded our knowledge of RNA virus diversity. Most genome mining approaches for detecting RNA viruses that encode RNA-dependent RNA polymerase (RdRp) rely on identifying this conserved protein, which is essential for the replication of RNA virus genomes. These approaches employ evolutionarily informed profile Hidden Markov Models (pHMMs) to scan large sequencing datasets for RdRp sequences. Recently, several new pHMM databases for RdRp detection have been released, each with distinct design principles, making it unclear which database is best for specific applications. Furthermore, these resources may be inaccessible to users without specialized computational expertise. Here we introduce the RdRp Collaborative Analysis Tool with Collections of pHMMs (RdRpCATCH: https://github.com/dimitris-karapliafis/RdRpCATCH), developed to consolidate publicly available RdRp pHMM resources into a single, accessible platform. RdRpCATCH enables the scanning of (meta)transcriptomic assemblies to discover RNA viruses and provides subsequent taxonomic annotation of detected contigs. A comparative analysis of RdRp pHMM databases reveals that most are highly effective at detecting known diversity of RNA viruses while minimizing false positives, supporting their joint use within RdRpCATCH. Certain databases are optimized for efficient scanning or exhibit high sensitivity, and we outline recommendations for their optimal use. RdRpCATCH is distributed as both a conda package and a web server application (https://rdrpcatch.bioinformatics.nl), facilitating access for researchers with diverse expertise. By integrating multiple pHMM resources, this unified framework addresses fragmentation in the field and reduces technical barriers to enable comprehensive viral discovery.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
21.9%
2
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
9.8%
3
PLOS Computational Biology
1633 papers in training set
Top 3%
9.8%
4
BMC Bioinformatics
383 papers in training set
Top 1%
8.0%
5
Bioinformatics Advances
184 papers in training set
Top 0.3%
7.0%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 3%
6.6%
7
Virus Evolution
140 papers in training set
Top 0.3%
6.1%
8
Viruses
318 papers in training set
Top 2%
2.0%
9
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.6%
10
Journal of Molecular Biology
217 papers in training set
Top 2%
1.6%
11
PLOS ONE
4510 papers in training set
Top 57%
1.4%
12
GigaScience
172 papers in training set
Top 2%
1.4%
13
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
14
Nature Biotechnology
147 papers in training set
Top 5%
1.3%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
16
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.9%
17
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
18
RNA
169 papers in training set
Top 0.4%
0.8%
19
Nature Communications
4913 papers in training set
Top 64%
0.7%
20
Scientific Reports
3102 papers in training set
Top 77%
0.7%
21
Microbial Genomics
204 papers in training set
Top 3%
0.6%
22
BMC Genomics
328 papers in training set
Top 7%
0.6%
23
Genome Research
409 papers in training set
Top 5%
0.6%
24
Patterns
70 papers in training set
Top 3%
0.6%