Back

OpusTaxa: A Unified Workflow for Taxonomic Profiling, Assembly, and Functional Analysis of Shotgun Metagenomes

Chen, Y.-K.; Harker, C. M.; Pham, C. M.; Grundy, L.; Wardill, H. R.; Roach, M. J.; Ryan, F. J.

2026-04-19 bioinformatics
10.64898/2026.04.15.718825 bioRxiv
Show abstract

Shotgun metagenomics has become a cornerstone of microbiome research, yet the complexity of existing workflows remains a major barrier for life scientists without dedicated bioinformatics support. Manual database setup, detailed sample sheet preparation, and management of software dependencies can make routine analysis difficult and time-consuming. Cross-study comparisons are further hampered by inconsistent processing pipelines, database versions, and profiling strategies, limiting reproducibility and the potential for large-scale meta-analyses. We present OpusTaxa, an open-source Snakemake workflow that provides end-to-end processing of short paired-end shotgun metagenomic data with minimal configuration. Users provide either FASTQ files or Sequence Read Archive accessions; OpusTaxa automatically downloads required databases, performs quality control, removes host reads, and executes taxonomic profiling, metagenome assembly, and functional analysis. All analysis modules can be independently toggled, and per-sample outputs are automatically merged into harmonised, cross-sample tables ready for downstream exploration. Across two public datasets, we demonstrate how OpusTaxa can be used to compare consistency across complementary taxonomic profilers and to estimate microbial load in addition to standard metagenomic workflows. AvailabilityOpusTaxa is freely available at https://github.com/yenkaiC/OpusTaxa. Documentation, test data, and example configurations are included in the repository.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.0%
2
Microbiome
139 papers in training set
Top 0.3%
8.2%
3
Nature Biotechnology
147 papers in training set
Top 1%
6.7%
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.7%
5
PLOS ONE
4510 papers in training set
Top 32%
4.7%
6
Scientific Data
174 papers in training set
Top 0.4%
3.9%
50% of probability mass above
7
mSystems
361 papers in training set
Top 3%
3.5%
8
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.5%
9
Cell Reports Methods
141 papers in training set
Top 1.0%
3.5%
10
Nature Communications
4913 papers in training set
Top 43%
3.0%
11
mSphere
281 papers in training set
Top 2%
2.7%
12
GigaScience
172 papers in training set
Top 0.7%
2.7%
13
Journal of Open Source Software
22 papers in training set
Top 0.1%
2.5%
14
PLOS Computational Biology
1633 papers in training set
Top 13%
2.4%
15
Genome Biology
555 papers in training set
Top 4%
2.0%
16
Bioinformatics Advances
184 papers in training set
Top 2%
2.0%
17
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
18
Nature Protocols
30 papers in training set
Top 0.1%
1.8%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
20
Genome Research
409 papers in training set
Top 3%
1.5%
21
Nature Methods
336 papers in training set
Top 5%
1.3%
22
Microbial Genomics
204 papers in training set
Top 2%
0.9%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
24
Microbiology Resource Announcements
22 papers in training set
Top 0.8%
0.8%
25
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
26
BMC Genomics
328 papers in training set
Top 6%
0.7%
27
Scientific Reports
3102 papers in training set
Top 76%
0.7%
28
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.6%