Back

Nohic: A Pipeline For Plant Contig Scaffolding Using Personalized References From Pangenome Graphs

Nguyen-Hoang, A.; Arslan, K.; Kopalli, V.; Windpassinger, S.; Perovic, D.; Stahl, A.; Golicz, A.

2026-03-19 bioinformatics
10.64898/2026.03.17.712436 bioRxiv
Show abstract

Hi-C data is commonly used for reference-free de novo scaffolding. However, with the rapid increase in high-quality reference genomes, reference-guided workflows are now more practical for assembling large numbers of target genomes without relying on costly and labor-intensive Hi-C sequencing. Recently, a pangenome graph-based haplotype sampling algorithm was introduced to generate personalized graphs for target genomes. Such graphs have strong potential as references for reference-guided contig scaffolding. Here, we present noHiC, a reference-guided scaffolding pipeline supporting key steps of plant contig scaffolding. A distinctive feature of noHiC is the nohic-refpick script, generating a best-fit synthetic reference (synref) from a pangenome graph that is genetically close to the target contigs. This enables the integration of genetic information from many references (up to 48 in our tests) without using them separately during scaffolding. Synrefs showed advantages over highly contiguous conventional references in reducing false contig breaking during reference-based correction. Additionally, nohic-refpick can be combined with fast scaffolders (ntJoin) to rapidly produce highly contiguous assemblies using synrefs derived from pangenome graphs. The noHiC pipeline, used alone or in combination with ntJoin, can generally produce assemblies that are structurally consistent with public Hi-C-based or manually curated genomes. The pipeline is publicly available at https://github.com/andyngh/noHiC. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=82 SRC="FIGDIR/small/712436v1_ufig1.gif" ALT="Figure 1"> View larger version (9K): org.highwire.dtl.DTLVardef@40bd8forg.highwire.dtl.DTLVardef@5d2bbborg.highwire.dtl.DTLVardef@e214a3org.highwire.dtl.DTLVardef@b90b06_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
14.4%
2
Molecular Plant
36 papers in training set
Top 0.1%
12.1%
3
Bioinformatics
1061 papers in training set
Top 3%
8.0%
4
Plant Communications
35 papers in training set
Top 0.1%
6.7%
5
Plant Biotechnology Journal
56 papers in training set
Top 0.2%
6.2%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.4%
4.7%
50% of probability mass above
7
Horticulture Research
43 papers in training set
Top 0.4%
4.2%
8
GigaScience
172 papers in training set
Top 0.4%
3.9%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
10
The Plant Journal
197 papers in training set
Top 2%
3.5%
11
Nature Communications
4913 papers in training set
Top 41%
3.5%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.5%
13
BMC Bioinformatics
383 papers in training set
Top 3%
3.2%
14
Genome Research
409 papers in training set
Top 2%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 11%
1.6%
16
Bioinformatics Advances
184 papers in training set
Top 3%
1.6%
17
BMC Genomics
328 papers in training set
Top 4%
1.2%
18
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.9%
19
Plant Physiology
217 papers in training set
Top 3%
0.8%
20
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
21
iScience
1063 papers in training set
Top 36%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%