Back

Aviti Sequencing and Marker Gene Data Analysis

Gould, T. J.; Taylor, M.; Santelli, C.

2026-02-09 bioinformatics
10.64898/2026.02.06.704475 bioRxiv
Show abstract

Accurate identification of microbial species in complex populations and communities relies on the isolation of representative marker 16S, ITS, and 18S sequences through the use of DNA extraction, PCR, and sequencing. Aviti sequencing has brought an improvement in the read quality and depth of marker gene sequencing technology. Quality scores exceeding Q40 representing highly accurate sequencing allows researchers to ask more questions of their marker gene data. However, this improvement in quality and throughput also brings with it a surprising increase in diversity of amplicon sequencing variants (ASVs) making further analysis and comparisons to previous studies on Illumina platforms challenging. This increased diversity causes downstream processing issues, including an over-reporting of chimeric ASVs. Here we identify this problem and put forward straightforward solutions to retain counts and reduce technically introduced diversity, as well as tying chimeric read identification to minimum parent distance. Through the use of synthetic mock samples, we discovered that erroneous ASVs are systematically substitution errors introduced by the upstream PCR methods. This error can be reduced significantly bioinformatically through clustering of ASVs within 99% similarity. Further we highlight technically introduced variation as a result of variable region length, sample misassignment, and sample biomass. Collectively, these results improve the similarity of Aviti and Illumina datasets for better comparisons of microbial studies from different platforms.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Microbial Genomics
204 papers in training set
Top 0.2%
12.8%
2
PLOS ONE
4510 papers in training set
Top 17%
10.7%
3
mSystems
361 papers in training set
Top 0.6%
10.7%
4
Frontiers in Microbiology
375 papers in training set
Top 1%
6.5%
5
PeerJ
261 papers in training set
Top 0.9%
5.0%
6
Scientific Reports
3102 papers in training set
Top 22%
5.0%
50% of probability mass above
7
mSphere
281 papers in training set
Top 0.9%
4.4%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
9
BMC Microbiology
35 papers in training set
Top 0.3%
2.1%
10
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
11
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.8%
12
F1000Research
79 papers in training set
Top 2%
1.7%
13
Microbiology Spectrum
435 papers in training set
Top 2%
1.7%
14
Frontiers in Bioinformatics
45 papers in training set
Top 0.3%
1.4%
15
mBio
750 papers in training set
Top 9%
1.3%
16
GigaScience
172 papers in training set
Top 2%
1.3%
17
Microorganisms
101 papers in training set
Top 1%
1.1%
18
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 4%
1.0%
19
Microbiome
139 papers in training set
Top 2%
1.0%
20
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.0%
21
iScience
1063 papers in training set
Top 26%
0.9%
22
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
23
Nature Communications
4913 papers in training set
Top 62%
0.8%
24
BioTechniques
24 papers in training set
Top 0.3%
0.8%
25
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
26
Environmental Science & Technology Letters
22 papers in training set
Top 0.4%
0.8%
27
Gigabyte
60 papers in training set
Top 2%
0.7%
28
Animal Microbiome
26 papers in training set
Top 0.4%
0.7%
29
Metabarcoding and Metagenomics
12 papers in training set
Top 0.1%
0.7%
30
Microbiology
57 papers in training set
Top 1%
0.7%