Benchmarking Short-Read ITS2 and Full-Length ITS Sequencing Reveals Pipeline-Dependent Biases in Indoor Fungal Community Profiling
Dong, M.; Blackwood, D.; Lott, M. E. J.; Castro, S. P.; Larkin, X.; Clerkin, T.; Hemric, H.; Nash, J.; Kim, Y. J.; Arnold, J.; David, L. A.; Vilgalys, R.; Fodor, A. A.; Noble, R. T.
Show abstract
Short-read amplicon sequencing is widely used for fungal surveys but can limit taxonomic resolution. Long-read sequencing enables recovery of the full internal transcribed spacer (ITS) region and may improve ecological and taxonomic inference. Here, we conducted a paired comparison of Illumina ITS2 and PacBio HiFi full-length ITS sequencing using identical DNA extracts from built-environmental air and surface samples (n = 68) collected across homes, a dormitory, and laboratories. Both datasets were taxonomically assigned using the same algorithm and reference database. We performed paired statistics, in-silico ITS2 trimming of long-read sequences, and cross-platform mapping at multiple identity thresholds. Full-length ITS provided higher taxonomic resolution, assigning a greater fraction of ASVs at the family (98% vs. 88%) and species (42% vs. 32%) ranks than ITS2 (paired Wilcoxon q = 0.002). Alpha-diversity comparisons showed similar Shannon diversity across pipelines, whereas richness metrics were consistently higher for full-length ITS. Beta-diversity analyses indicated broadly comparable community-level patterns, although full-length ITS revealed stronger sample-type- and location-associated structure (PERMANOVA R{superscript 2} [≥] 0.06, p = 0.0001). In-silico ITS2 trimming reduced these differences, indicating that amplicon length is a major contributor to enhanced taxonomic resolution and ecological inference. Cross-platform mapping further showed extensive one-to-many relationships between ITS2 and full-length ITS ASVs, consistent with increased sequence resolution in long-read data. Together, these results show that ITS2 sequencing provides robust community-level profiling, while full-length ITS enables improved richness estimates and finer ecological and taxonomic resolution. This paired, bias-aware framework provides a practical template for selecting fungal amplicon sequencing strategies in built-environment mycobiome studies. ImportanceFungal communities in built environments influence indoor air quality and human exposure, yet their characterization depends strongly on sequencing strategy. This study provides a controlled, paired comparison of short-read ITS2 and long-read full-length ITS sequencing, showing that differences in amplicon length substantially contribute to variation in taxonomic resolution and ecological inference. While both approaches yield comparable community-level patterns, full-length ITS improves richness estimates, species-level assignment, and environmental discrimination by resolving sequence variation collapsed in ITS2 surveys. By integrating paired diversity analyses, in-silico ITS2 trimming, and cross-platform ASV mapping, this work offers a bias-aware framework for evaluating fungal amplicon pipelines. Importantly, improved species-level resolution enables functional interpretation of indoor fungi, for example the identification of taxa associated with pathogenic traits, allergen production, or toxin synthesis, supporting the development of more informative exposure metrics and targeted assays relevant to human health in built environments.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.