Integrated optimization of experimental and computational workflows improves genome recovery in long-read gut metagenomics
Hu, Y.; Sun, L.; Huang, Y.; Jiang, F.; Tong, X.; Yang, J.; Ju, Y.; Yang, Z.; Liufu, S.; Hu, Y.; Ma, W.; Guo, R.; Li, W.; Zhang, T.; Zhu, X.; Zhang, Z.
Show abstract
Short-read metagenomic sequencing is widely applied in microbiome research due to its high quality and increasingly more affordable prices. However, it suffers from fragmented reads which limits assembly contiguity and the recovery of complete microbial genomes. In contrast, long-read sequencing, with substantially longer read lengths, can help overcome these limitations. Achieving complete and accurate genome recovery is a central goal in metagenomics. To advance this goal, we present a systematic effort to unify and optimize the long-read sequencing workflow, from experimental sample processing to computational genome assembly, using the CycloneSEQ platform. ImportanceOur results underscore that upstream protocol selection is critical for the performance of long-read in metagenomic sequencing. Employing magnetic plate-based DNA extraction with pretreatment during library preparation generated longer DNA fragments, and consequently, longer sequencing reads. These improvements directly contributed to enhanced data quality and better recovery of microbial diversity. Subsequent assembly benchmarking showed that integrating matched long-read (CycloneSEQ) and short-read (DNBSEQ) datasets achieved optimal performance, with long-read data improved assembly contiguity, and short-read data improved the quality of the assembled MAGs. Finally, while the hybrid approach recovered more genomes, the strategy of long-read assembly followed by short-read polishing achieves the best overall performance in fecal meteagenome data, effectively balancing genomic contiguity and sequence accuracy.
Matching journals
The top 10 journals account for 50% of the predicted probability mass.