Back

Identifying crossovers in a cattle pangenome containing haplotype-resolved assemblies from half-siblings

Leonard, A. S.; Pausch, H.

2026-02-20 genomics
10.64898/2026.02.20.706955 bioRxiv
Show abstract

BackgroundRecombination of parental haplotypes is a fundamental biological process that ensures proper segregation of homologous chromosomes and creates new combinations of alleles during meiosis. Crossover events are typically detected from large-scale pedigree-based genetic studies or linkage disequilibrium-based recombination maps, although these are generally limited to SNPs. Increasing amounts of long read sequencing and haplotype-resolved assemblies offer an alternative approach to examining recombination events at basepair resolution, albeit with much smaller sample sizes. ResultsHere, we analyse five high-quality genome assemblies from the Simmental cattle breed, including a newly assembled triobinned HiFi assembly of an Eringer x Simmental cross (N50 of 77 Mb and a k-mer quality value of 55.3). We integrate the five assemblies, of which two originate from maternal half-siblings, into a reference-free Simmental-specific pangenome. By considering path similarities in the pangenome, we were able to identify putative crossover events in the haplotypes of the half-siblings, as well as a greater number of events relative to the cousin due to an additional degree of generational separation. We validated the pangenome approach with phased SNPs called from linear alignments of maternal short read sequencing, with 23 of 30 chromosomes having the same recombination predictions. We identified 5 and 16.7 Mb of non-reference insertion sequences respectively shared or private to the half-siblings, enabling testing for recombination events beyond only SNP markers. We also identified four differentially methylated CpG clusters from the 5mC signal of HiFi reads which allowed us to narrow the window containing the putative recombination event from 35 to 20 Mb within the longest run of homozygosity. ConclusionStructural variants and methylation information identified from long read sequencing and genome assemblies may help identify recombination events in regions beyond those typically called from SNPs. Furthermore, while existing long read-based methylation calls can be noisy and report unrealistic intermediate methylation levels, 5mC methylation appears to be a promising avenue for distinguishing haplotypes in the absence of genomic variation.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
22.2%
2
Genetics Selection Evolution
33 papers in training set
Top 0.1%
8.3%
3
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.2%
7.1%
4
Scientific Reports
3102 papers in training set
Top 15%
6.7%
5
PLOS ONE
4510 papers in training set
Top 29%
6.2%
50% of probability mass above
6
Methods in Ecology and Evolution
160 papers in training set
Top 0.8%
3.9%
7
PLOS Genetics
756 papers in training set
Top 5%
3.5%
8
Frontiers in Genetics
197 papers in training set
Top 2%
3.5%
9
Nature Communications
4913 papers in training set
Top 45%
2.6%
10
Genome Research
409 papers in training set
Top 2%
2.1%
11
Communications Biology
886 papers in training set
Top 5%
2.0%
12
Journal of Dairy Science
11 papers in training set
Top 0.1%
2.0%
13
Gigabyte
60 papers in training set
Top 0.6%
1.9%
14
Molecular Ecology Resources
161 papers in training set
Top 0.5%
1.9%
15
Bioinformatics
1061 papers in training set
Top 7%
1.7%
16
Genetics
225 papers in training set
Top 3%
1.5%
17
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.5%
1.3%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
19
Genome Biology
555 papers in training set
Top 6%
0.9%
20
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
21
Royal Society Open Science
193 papers in training set
Top 5%
0.7%
22
Genomics
60 papers in training set
Top 3%
0.7%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
Emerging Infectious Diseases
103 papers in training set
Top 3%
0.7%
25
Developmental Dynamics
50 papers in training set
Top 0.8%
0.7%
26
Microbial Genomics
204 papers in training set
Top 2%
0.7%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%