Back

Benchmarking Short-Read ITS2 and Full-Length ITS Sequencing Reveals Pipeline-Dependent Biases in Indoor Fungal Community Profiling

Dong, M.; Blackwood, D.; Lott, M. E. J.; Castro, S. P.; Larkin, X.; Clerkin, T.; Hemric, H.; Nash, J.; Kim, Y. J.; Arnold, J.; David, L. A.; Vilgalys, R.; Fodor, A. A.; Noble, R. T.

2026-05-15 microbiology
10.64898/2026.05.15.725464 bioRxiv
Show abstract

Short-read amplicon sequencing is widely used for fungal surveys but can limit taxonomic resolution. Long-read sequencing enables recovery of the full internal transcribed spacer (ITS) region and may improve ecological and taxonomic inference. Here, we conducted a paired comparison of Illumina ITS2 and PacBio HiFi full-length ITS sequencing using identical DNA extracts from built-environmental air and surface samples (n = 68) collected across homes, a dormitory, and laboratories. Both datasets were taxonomically assigned using the same algorithm and reference database. We performed paired statistics, in-silico ITS2 trimming of long-read sequences, and cross-platform mapping at multiple identity thresholds. Full-length ITS provided higher taxonomic resolution, assigning a greater fraction of ASVs at the family (98% vs. 88%) and species (42% vs. 32%) ranks than ITS2 (paired Wilcoxon q = 0.002). Alpha-diversity comparisons showed similar Shannon diversity across pipelines, whereas richness metrics were consistently higher for full-length ITS. Beta-diversity analyses indicated broadly comparable community-level patterns, although full-length ITS revealed stronger sample-type- and location-associated structure (PERMANOVA R{superscript 2} [≥] 0.06, p = 0.0001). In-silico ITS2 trimming reduced these differences, indicating that amplicon length is a major contributor to enhanced taxonomic resolution and ecological inference. Cross-platform mapping further showed extensive one-to-many relationships between ITS2 and full-length ITS ASVs, consistent with increased sequence resolution in long-read data. Together, these results show that ITS2 sequencing provides robust community-level profiling, while full-length ITS enables improved richness estimates and finer ecological and taxonomic resolution. This paired, bias-aware framework provides a practical template for selecting fungal amplicon sequencing strategies in built-environment mycobiome studies. ImportanceFungal communities in built environments influence indoor air quality and human exposure, yet their characterization depends strongly on sequencing strategy. This study provides a controlled, paired comparison of short-read ITS2 and long-read full-length ITS sequencing, showing that differences in amplicon length substantially contribute to variation in taxonomic resolution and ecological inference. While both approaches yield comparable community-level patterns, full-length ITS improves richness estimates, species-level assignment, and environmental discrimination by resolving sequence variation collapsed in ITS2 surveys. By integrating paired diversity analyses, in-silico ITS2 trimming, and cross-platform ASV mapping, this work offers a bias-aware framework for evaluating fungal amplicon pipelines. Importantly, improved species-level resolution enables functional interpretation of indoor fungi, for example the identification of taxa associated with pathogenic traits, allergen production, or toxin synthesis, supporting the development of more informative exposure metrics and targeted assays relevant to human health in built environments.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Applied and Environmental Microbiology
301 papers in training set
Top 0.1%
17.1%
2
Scientific Reports
3102 papers in training set
Top 11%
8.2%
3
mSystems
361 papers in training set
Top 2%
6.2%
4
mSphere
281 papers in training set
Top 1%
4.2%
5
Microbiology Spectrum
435 papers in training set
Top 0.5%
4.2%
6
Environmental Microbiology
119 papers in training set
Top 0.8%
3.9%
7
Microbiome
139 papers in training set
Top 0.9%
3.9%
8
Environmental Science & Technology
64 papers in training set
Top 0.8%
3.9%
50% of probability mass above
9
Frontiers in Microbiology
375 papers in training set
Top 3%
3.0%
10
PLOS ONE
4510 papers in training set
Top 44%
2.7%
11
mBio
750 papers in training set
Top 6%
2.5%
12
Nature Communications
4913 papers in training set
Top 48%
2.0%
13
Methods in Ecology and Evolution
160 papers in training set
Top 1%
2.0%
14
Molecular Ecology Resources
161 papers in training set
Top 0.6%
1.7%
15
FEMS Microbes
14 papers in training set
Top 0.1%
1.6%
16
New Phytologist
309 papers in training set
Top 3%
1.5%
17
Science of The Total Environment
179 papers in training set
Top 4%
1.3%
18
International Journal of Food Microbiology
11 papers in training set
Top 0.3%
1.3%
19
Environmental Microbiome
26 papers in training set
Top 0.3%
1.2%
20
Microbial Ecology
28 papers in training set
Top 0.2%
1.2%
21
ISME Communications
103 papers in training set
Top 2%
1.1%
22
Environmental Pollution
35 papers in training set
Top 2%
1.1%
23
Journal of Microbiological Methods
11 papers in training set
Top 0.3%
1.1%
24
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.9%
25
Indoor Air
10 papers in training set
Top 0.3%
0.8%
26
Molecular Ecology
304 papers in training set
Top 4%
0.8%
27
eLife
5422 papers in training set
Top 56%
0.8%
28
Microbial Genomics
204 papers in training set
Top 2%
0.7%
29
Environment International
42 papers in training set
Top 1%
0.7%
30
Environmental Microbiology Reports
27 papers in training set
Top 0.8%
0.7%