Back

Identification and correction of phase switches with Hi-C data in the Nanopore and HiFi chromosome-scale assemblies of the dikaryotic leaf rust fungus Puccinia triticina

Duan, H.; Jones, A.; Hewitt, T.; Mackenzie, A.; Hu, Y.; Sharp, A.; Lewis, D.; Mago, R.; Upadhyaya, N.; Rathjen, J.; Stone, E.; Schwessinger, B.; Figueroa, M.; Dodds, P.; Periyannan, S.; Sperschneider, J.

2021-04-29 genomics
10.1101/2021.04.28.441890 bioRxiv
Show abstract

BackgroundMost animals and plants have more than one set of chromosomes and package these haplotypes into a single nucleus within each cell. In contrast, many fungal species carry multiple haploid nuclei per cell. Rust fungi are such species with two nuclei (karyons) that contain a full set of haploid chromosomes each. The physical separation of haplotypes in dikaryons means that, unlike in diploids, Hi-C chromatin contacts between haplotypes are false positive signals. ResultsWe generate the first chromosome-scale, fully-phased assembly for the dikaryotic leaf rust fungus Puccinia triticina and compare Nanopore MinION and PacBio HiFi sequence-based assemblies. We show that false positive Hi-C contacts between haplotypes are predominantly caused by phase switches rather than by collapsed regions or Hi-C read mis-mappings. We introduce a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs, including a phase switch correction step. In the HiFi assembly, relatively few phase switches occur, and these are predominantly located at haplotig boundaries and can be readily corrected. In contrast, phase switches are widespread throughout the Nanopore assembly. We show that haploid genome read coverage of 30-40 times using HiFi sequencing is required for phasing of the leaf rust genome (~0.7% heterozygosity) and that HiFi sequencing resolves genomic regions with low heterozygosity that are otherwise collapsed in the Nanopore assembly. ConclusionsThis first Hi-C based phasing pipeline for dikaryons and comparison of long-read sequencing technologies will inform future genome assembly and haplotype phasing projects in other non-haploid organisms.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
G3
33 papers in training set
Top 0.1%
18.5%
2
BMC Bioinformatics
383 papers in training set
Top 0.8%
10.4%
3
Bioinformatics
1061 papers in training set
Top 3%
10.1%
4
Applications in Plant Sciences
21 papers in training set
Top 0.1%
4.8%
5
Methods in Ecology and Evolution
160 papers in training set
Top 0.6%
4.8%
6
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.1%
3.9%
50% of probability mass above
7
BMC Genomics
328 papers in training set
Top 0.6%
3.9%
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.7%
3.6%
9
New Phytologist
309 papers in training set
Top 2%
3.2%
10
Genome Biology
555 papers in training set
Top 3%
3.2%
11
Genetics
225 papers in training set
Top 2%
2.6%
12
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.9%
13
Scientific Reports
3102 papers in training set
Top 58%
1.7%
14
BMC Biology
248 papers in training set
Top 1%
1.7%
15
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
16
Molecular Ecology Resources
161 papers in training set
Top 0.6%
1.7%
17
Nature Communications
4913 papers in training set
Top 52%
1.7%
18
PLOS ONE
4510 papers in training set
Top 59%
1.3%
19
The Plant Journal
197 papers in training set
Top 3%
0.9%
20
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
21
GigaScience
172 papers in training set
Top 2%
0.9%
22
Microbial Genomics
204 papers in training set
Top 2%
0.9%
23
Frontiers in Plant Science
240 papers in training set
Top 5%
0.8%
24
mSphere
281 papers in training set
Top 6%
0.7%
25
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
26
mBio
750 papers in training set
Top 13%
0.6%
27
Phytopathology®
28 papers in training set
Top 0.7%
0.6%
28
Cell Reports Methods
141 papers in training set
Top 6%
0.6%