Back

LoFi drafts to map to: 4 haplotype-resolved Cannabis genomes enable characterization of large structural variants

Pike, B.; Goncalves da Silva, A.; Teran, W.

2026-01-22 plant biology
10.64898/2026.01.19.700373 bioRxiv
Show abstract

We present fully-phased, chromosome-scale genome assemblies of 4 genotypes of Cannabis sativa. These assemblies were built from Oxford Nanopore R9.4.1 long reads, which previously have been considered insufficiently accurate for proper phasing. Contigs produced by the Phased Error Correction and Assembly Tool (PECAT), in combination with Hi-C libraries, were used by GreenHill to develop intermediate data structures that permit accurate phasing of the dual contigs, which were then scaffolded by the advanced algorithm of Yet another Hi-C Scaffolder (YaHS). These assemblies, while low in QV, are comparable to recent HiFi assemblies in their contiguity and gene content, and also show good macrosynteny with them. We compare these 8 haplotypes with 77 others recently produced and present a phylogenetic analysis, as well as a first draft of the Cannabis pan-NLRome. CoreWe assembled four fully-phased and chromosome-scale diploid genomes of Cannabis sativa, using Oxford Nanopore Technology readsets. These new assemblies are comparable to recent PacBio HiFi assemblies in terms of contiguity and gene content. We present a phylogenomic analysis, using whole-genome alignments after including 77 other publicly available Cannabis genomes, as well as a draft pan-NLRome. Gene and Accession NumbersAssemblies are archived at NCBI as BioProjects PRJNA1301983 (ANC), PRJNA1301963 (HAW), PRJNA1301984 (SRI), and PRJNA1301985 (TRC). Assemblies, annotations, and Supplemental Tables are also available on Zenodo: https://doi.org/10.5281/zenodo.16456638.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
GigaScience
172 papers in training set
Top 0.1%
14.2%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 8%
8.3%
3
Genome Research
409 papers in training set
Top 0.4%
6.3%
4
Nature Communications
4913 papers in training set
Top 33%
4.8%
5
Scientific Reports
3102 papers in training set
Top 25%
4.8%
6
Genetics
225 papers in training set
Top 0.9%
4.5%
7
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.6%
3.6%
8
Current Biology
596 papers in training set
Top 5%
3.6%
50% of probability mass above
9
Scientific Data
174 papers in training set
Top 0.6%
3.0%
10
Nucleic Acids Research
1128 papers in training set
Top 7%
2.7%
11
Frontiers in Genetics
197 papers in training set
Top 3%
2.6%
12
PLOS ONE
4510 papers in training set
Top 49%
2.1%
13
PLOS Genetics
756 papers in training set
Top 7%
2.1%
14
Science Advances
1098 papers in training set
Top 13%
2.1%
15
Genome Biology
555 papers in training set
Top 4%
1.9%
16
Nature Genetics
240 papers in training set
Top 4%
1.8%
17
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.4%
1.7%
18
eLife
5422 papers in training set
Top 43%
1.6%
19
Nature
575 papers in training set
Top 12%
1.3%
20
Communications Biology
886 papers in training set
Top 13%
1.3%
21
BMC Biology
248 papers in training set
Top 2%
1.3%
22
Viruses
318 papers in training set
Top 4%
1.2%
23
Frontiers in Plant Science
240 papers in training set
Top 4%
0.9%
24
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
25
PeerJ
261 papers in training set
Top 14%
0.8%
26
Science
429 papers in training set
Top 19%
0.8%
27
The Plant Journal
197 papers in training set
Top 3%
0.7%
28
Genes
126 papers in training set
Top 3%
0.7%
29
iScience
1063 papers in training set
Top 35%
0.7%
30
Nature Biotechnology
147 papers in training set
Top 8%
0.7%