Back

Integrated optimization of experimental and computational workflows improves genome recovery in long-read gut metagenomics

Hu, Y.; Sun, L.; Huang, Y.; Jiang, F.; Tong, X.; Yang, J.; Ju, Y.; Yang, Z.; Liufu, S.; Hu, Y.; Ma, W.; Guo, R.; Li, W.; Zhang, T.; Zhu, X.; Zhang, Z.

2026-05-26 bioinformatics
10.64898/2026.05.22.727065 bioRxiv
Show abstract

Short-read metagenomic sequencing is widely applied in microbiome research due to its high quality and increasingly more affordable prices. However, it suffers from fragmented reads which limits assembly contiguity and the recovery of complete microbial genomes. In contrast, long-read sequencing, with substantially longer read lengths, can help overcome these limitations. Achieving complete and accurate genome recovery is a central goal in metagenomics. To advance this goal, we present a systematic effort to unify and optimize the long-read sequencing workflow, from experimental sample processing to computational genome assembly, using the CycloneSEQ platform. ImportanceOur results underscore that upstream protocol selection is critical for the performance of long-read in metagenomic sequencing. Employing magnetic plate-based DNA extraction with pretreatment during library preparation generated longer DNA fragments, and consequently, longer sequencing reads. These improvements directly contributed to enhanced data quality and better recovery of microbial diversity. Subsequent assembly benchmarking showed that integrating matched long-read (CycloneSEQ) and short-read (DNBSEQ) datasets achieved optimal performance, with long-read data improved assembly contiguity, and short-read data improved the quality of the assembled MAGs. Finally, while the hybrid approach recovered more genomes, the strategy of long-read assembly followed by short-read polishing achieves the best overall performance in fecal meteagenome data, effectively balancing genomic contiguity and sequence accuracy.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
mSystems
361 papers in training set
Top 0.4%
12.8%
2
Microbiome
139 papers in training set
Top 0.3%
8.5%
3
Scientific Reports
3102 papers in training set
Top 23%
4.9%
4
Microbial Genomics
204 papers in training set
Top 0.5%
4.3%
5
Briefings in Bioinformatics
326 papers in training set
Top 2%
4.0%
6
Bioinformatics
1061 papers in training set
Top 5%
4.0%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
8
mSphere
281 papers in training set
Top 1%
3.6%
9
Water Research
74 papers in training set
Top 0.6%
3.6%
10
PLOS ONE
4510 papers in training set
Top 42%
3.1%
50% of probability mass above
11
Nature Communications
4913 papers in training set
Top 44%
2.7%
12
Frontiers in Microbiology
375 papers in training set
Top 4%
2.5%
13
Microbiology Spectrum
435 papers in training set
Top 2%
2.1%
14
PeerJ
261 papers in training set
Top 5%
2.1%
15
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
16
Microorganisms
101 papers in training set
Top 0.5%
1.9%
17
Genome Biology
555 papers in training set
Top 4%
1.8%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.3%
19
PLOS Computational Biology
1633 papers in training set
Top 18%
1.3%
20
iScience
1063 papers in training set
Top 19%
1.3%
21
npj Biofilms and Microbiomes
56 papers in training set
Top 1%
1.2%
22
BMC Microbiology
35 papers in training set
Top 1.0%
1.1%
23
BMC Genomics
328 papers in training set
Top 4%
0.9%
24
GigaScience
172 papers in training set
Top 2%
0.9%
25
Communications Biology
886 papers in training set
Top 18%
0.9%
26
F1000Research
79 papers in training set
Top 3%
0.9%
27
Science of The Total Environment
179 papers in training set
Top 4%
0.8%
28
mBio
750 papers in training set
Top 11%
0.8%
29
Gut Microbes
70 papers in training set
Top 0.9%
0.8%
30
Environmental Science & Technology Letters
22 papers in training set
Top 0.3%
0.8%