Back

A Targeted Reference Database for Improved Analysis of Environmental 16S rRNA Oxford Nanopore Sequencing Data

Philip, M.; Nilsen, T.; Majaneva, S.; Pettersen, R.; Stokkan, M.; Ray, J. L.; Keeley, N.; rudi, k.; Snipen, L.-G.

2024-10-03 bioinformatics
10.1101/2024.10.03.616456 bioRxiv
Show abstract

The Oxford Nanopore Technologies (ONT) sequencing platform is compact and efficient, making it suitable for rapid biodiversity assessments in remote areas. Despite its long reads, ONT has a higher error rate compared to other platforms, necessitating high-quality reference databases for accurate taxonomic assignments. However, the absence of targeted databases for underexplored habitats, such as the seafloor, limits ONTs broader applicability for exploratory analysis. To address this, we propose an approach for building environmentally-targeted databases to improve 16S rRNA gene (16S) analysis using Oxford Nanopore Technologies (ONT), using seafloor sediment samples from the Norwegian coast as an example. We started by using Illumina short-read data to create a database of full-length or near full-length 16S sequences from seafloor samples. Initially, amplicons are mapped to the SILVA database, with matches added to our database. Unmatched amplicons are reconstructed using METASEED and Barrnap methodologies with amplicon and metagenome data. Finally, if the previous strategies did not succeed, we included the short-read sequences in the database. This resulted in AQUAeD-DB, which contains 14 545 16S sequences clustered at 95% identity. Comparative database analysis reveal that AQUAeD-DB provides consistent results for both Illumina and Nanopore read assignments (median correlation coefficient: 0.50), whereas a standard database showed a substantially weaker correlation. These findings also emphasize its potential to recognize both high and low-abundance taxa, which could be key indicators in environmental studies. This work highlights the necessity of targeted databases for environmental analysis, especially for ONT-based studies, and lays foundations for future extension of the database.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 9%
18.8%
2
Science of The Total Environment
179 papers in training set
Top 0.7%
10.2%
3
Environmental DNA
49 papers in training set
Top 0.1%
8.3%
4
Scientific Reports
3102 papers in training set
Top 26%
4.6%
5
Metabarcoding and Metagenomics
12 papers in training set
Top 0.1%
4.3%
6
Molecular Ecology Resources
161 papers in training set
Top 0.3%
4.0%
50% of probability mass above
7
Limnology and Oceanography: Methods
11 papers in training set
Top 0.1%
3.9%
8
Ecological Indicators
20 papers in training set
Top 0.1%
3.6%
9
PeerJ
261 papers in training set
Top 3%
2.9%
10
Frontiers in Microbiology
375 papers in training set
Top 4%
2.6%
11
Water Research
74 papers in training set
Top 0.8%
2.1%
12
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
14
International Journal of Environmental Research and Public Health
124 papers in training set
Top 4%
1.5%
15
Scientific Data
174 papers in training set
Top 1%
1.3%
16
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.3%
17
Microbiology Resource Announcements
22 papers in training set
Top 0.5%
1.2%
18
Gigabyte
60 papers in training set
Top 1%
0.9%
19
Bioinformatics
1061 papers in training set
Top 9%
0.9%
20
Frontiers in Plant Science
240 papers in training set
Top 5%
0.9%
21
Microbiology Spectrum
435 papers in training set
Top 5%
0.8%
22
GigaScience
172 papers in training set
Top 3%
0.8%
23
mSystems
361 papers in training set
Top 7%
0.8%
24
Ecological Informatics
29 papers in training set
Top 0.7%
0.8%
25
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
26
Malaria Journal
48 papers in training set
Top 2%
0.6%
27
Peer Community Journal
254 papers in training set
Top 4%
0.6%
28
Environmental Microbiome
26 papers in training set
Top 0.7%
0.5%
29
Environmental Research
46 papers in training set
Top 2%
0.5%
30
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%