Back

A genome-resolved view of the wastewater RNA virome

Kantor, R. S.; Shakya, M.; Ruth, N.; Rothman, J. A.; Rushford, C.; Gregory, D. A.; Epstein, A.; Kaufman, J. T.; Allen, J. E.; Chain, P. S. G.; O'Connor, D. H.; Johnson, M. C.

2026-05-22 infectious diseases
10.64898/2026.05.19.26353600 medRxiv
Show abstract

Sequencing-based wastewater surveillance is emerging as an important tool in pathogen-agnostic threat detection, potentially enabling early identification before capture through clinical surveillance systems. However, virus sequences of human pathogens are typically low in abundance in wastewater while much of the data is unclassifiable at the read level. This presents a challenge because genomes may not assemble well for novel pathogens of interest, but read-based methods cannot currently separate novel from previously seen unclassified sequences. Using ultra-deep untargeted sequencing of the wastewater RNA virome performed by the CASPER consortium (321 samples), we constructed a wastewater virus genome database (WVDB) with the goal of expanding the set of available high-quality non-redundant reference genomes. The first version of this database contains 21,015 near-complete viral genomes, of which the majority are ssRNA bacteriophage (79%). We additionally recovered genomes for putative plant and vertebrate-infecting viruses, human enteric viruses, and viruses whose host could not be predicted. Fewer than 4000 genomes had matches in previously published virus genome databases, and WVDB captured around one fifth of the reads that could not be classified by Kraken2. Further expansion of WVDB will provide a comprehensive resource of RNA virus genomes for characterization of viral diversity and dynamics in wastewater across space and time.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Environmental Science & Technology Letters
22 papers in training set
Top 0.1%
28.5%
2
Water Research
74 papers in training set
Top 0.3%
8.7%
3
Nature Communications
4913 papers in training set
Top 25%
7.4%
4
Microbiome
139 papers in training set
Top 0.6%
5.0%
5
Environmental Science & Technology
64 papers in training set
Top 0.7%
4.3%
50% of probability mass above
6
Med
38 papers in training set
Top 0.1%
2.4%
7
Scientific Reports
3102 papers in training set
Top 52%
1.9%
8
Science of The Total Environment
179 papers in training set
Top 3%
1.9%
9
The Lancet Microbe
43 papers in training set
Top 0.5%
1.9%
10
Nature Microbiology
133 papers in training set
Top 2%
1.8%
11
Emerging Infectious Diseases
103 papers in training set
Top 1%
1.8%
12
mSystems
361 papers in training set
Top 4%
1.8%
13
mBio
750 papers in training set
Top 7%
1.8%
14
Viruses
318 papers in training set
Top 3%
1.7%
15
Nature
575 papers in training set
Top 12%
1.5%
16
Clinical Infectious Diseases
231 papers in training set
Top 3%
1.4%
17
Environmental Health Perspectives
17 papers in training set
Top 0.3%
1.4%
18
Nature Biotechnology
147 papers in training set
Top 6%
1.3%
19
ISME Communications
103 papers in training set
Top 1%
1.3%
20
Microbiology Spectrum
435 papers in training set
Top 4%
0.9%
21
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
22
Epidemics
104 papers in training set
Top 2%
0.8%
23
Genome Biology
555 papers in training set
Top 7%
0.8%
24
Eurosurveillance
80 papers in training set
Top 1%
0.8%
25
Cell Reports Medicine
140 papers in training set
Top 8%
0.7%
26
The Journal of Infectious Diseases
182 papers in training set
Top 5%
0.7%
27
Scientific Data
174 papers in training set
Top 3%
0.7%
28
FACETS
11 papers in training set
Top 0.4%
0.7%
29
Epidemiology and Infection
84 papers in training set
Top 4%
0.5%