RdRpCATCH: A unified resource for RNA virus discovery using viral RNA-dependent RNA polymerase profile Hidden Markov models
Karapliafis, D.; Neri, U.; Olendraite, I.; Charon, J.; Sakaguchi, S.; Hou, X.; de Ridder, D.; Zwart, M. P.; Kupczok, A.
Show abstract
Recent advances in metatranscriptomics and large-scale mining of publicly available sequencing datasets have substantially expanded our knowledge of RNA virus diversity. Most genome mining approaches for detecting RNA viruses that encode RNA-dependent RNA polymerase (RdRp) rely on identifying this conserved protein, which is essential for the replication of RNA virus genomes. These approaches employ evolutionarily informed profile Hidden Markov Models (pHMMs) to scan large sequencing datasets for RdRp sequences. Recently, several new pHMM databases for RdRp detection have been released, each with distinct design principles, making it unclear which database is best for specific applications. Furthermore, these resources may be inaccessible to users without specialized computational expertise. Here we introduce the RdRp Collaborative Analysis Tool with Collections of pHMMs (RdRpCATCH: https://github.com/dimitris-karapliafis/RdRpCATCH), developed to consolidate publicly available RdRp pHMM resources into a single, accessible platform. RdRpCATCH enables the scanning of (meta)transcriptomic assemblies to discover RNA viruses and provides subsequent taxonomic annotation of detected contigs. A comparative analysis of RdRp pHMM databases reveals that most are highly effective at detecting known diversity of RNA viruses while minimizing false positives, supporting their joint use within RdRpCATCH. Certain databases are optimized for efficient scanning or exhibit high sensitivity, and we outline recommendations for their optimal use. RdRpCATCH is distributed as both a conda package and a web server application (https://rdrpcatch.bioinformatics.nl), facilitating access for researchers with diverse expertise. By integrating multiple pHMM resources, this unified framework addresses fragmentation in the field and reduces technical barriers to enable comprehensive viral discovery.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.