Nohic: A Pipeline For Plant Contig Scaffolding Using Personalized References From Pangenome Graphs
Nguyen-Hoang, A.; Arslan, K.; Kopalli, V.; Windpassinger, S.; Perovic, D.; Stahl, A.; Golicz, A.
Show abstract
Hi-C data is commonly used for reference-free de novo scaffolding. However, with the rapid increase in high-quality reference genomes, reference-guided workflows are now more practical for assembling large numbers of target genomes without relying on costly and labor-intensive Hi-C sequencing. Recently, a pangenome graph-based haplotype sampling algorithm was introduced to generate personalized graphs for target genomes. Such graphs have strong potential as references for reference-guided contig scaffolding. Here, we present noHiC, a reference-guided scaffolding pipeline supporting key steps of plant contig scaffolding. A distinctive feature of noHiC is the nohic-refpick script, generating a best-fit synthetic reference (synref) from a pangenome graph that is genetically close to the target contigs. This enables the integration of genetic information from many references (up to 48 in our tests) without using them separately during scaffolding. Synrefs showed advantages over highly contiguous conventional references in reducing false contig breaking during reference-based correction. Additionally, nohic-refpick can be combined with fast scaffolders (ntJoin) to rapidly produce highly contiguous assemblies using synrefs derived from pangenome graphs. The noHiC pipeline, used alone or in combination with ntJoin, can generally produce assemblies that are structurally consistent with public Hi-C-based or manually curated genomes. The pipeline is publicly available at https://github.com/andyngh/noHiC. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=82 SRC="FIGDIR/small/712436v1_ufig1.gif" ALT="Figure 1"> View larger version (9K): org.highwire.dtl.DTLVardef@40bd8forg.highwire.dtl.DTLVardef@5d2bbborg.highwire.dtl.DTLVardef@e214a3org.highwire.dtl.DTLVardef@b90b06_HPS_FORMAT_FIGEXP M_FIG C_FIG
Matching journals
The top 6 journals account for 50% of the predicted probability mass.