Back

Detection and evaluation of copy number variation using both linked-read and short-read sequencing in New Zealand dairy cattle

Wang, Y.; Nugroho, T.; Johnson, T. J. J.; Couldrey, C.; Harris, B. L.

2026-04-23 bioinformatics
10.64898/2026.04.20.718595 bioRxiv
Show abstract

In recent years, genetic studies have made significant progress in identifying single-nucleotide polymorphisms (SNPs) associated with cattle health and production traits. However, it is still challenging to identify and validate more complicated forms of variation, such as copy number variation (CNV) and other types of structural variation (SV). In this study, SV regions were identified using 37 New Zealand dairy cattle with linked-read sequence data. A transmission-based framework was used to validate these variants at the population scale. 62,438 putative autosomal SV regions were identified with the LongRanger pipeline following the 10x Genomics recommendations. Copy number states for these regions were subsequently estimated via a read-depth based genotyping method using CNVpytor in a population-representative cohort of 2306 animals using Illumina short-read sequencing technology. Mendelian inheritance of copy number states was assessed using linear mixed models incorporating pedigree information, and transmission levels were used to quantify the biological validity of each CNV region. Transmission levels ranged widely, with a mean of 0.5162 across all regions, where higher transmission levels were proportionally enriched for larger SVs. A total of 7218 CNV regions exhibited high transmission levels (>0.9), indicating strong evidence of inheritance. Among these, 7136 overlapped CNV regions reported in one or more public datasets, while 82 high-confidence regions represent previously unreported variants. High-transmission CNV regions tended to show clear, discrete inheritance patterns in trio families, providing the biological evidence that these CNVs are inherited within the population. Together, these results demonstrate that integrating linked-read sequencing with population-scale transmission-based validation provides a robust framework for identifying high-confidence CNV regions. This catalogue of validated CNV regions represents an important resource for downstream functional analyses and the incorporation of structural variation into genomic selection and breeding programs.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
22.5%
2
Scientific Reports
3102 papers in training set
Top 4%
12.3%
3
Frontiers in Genetics
197 papers in training set
Top 0.2%
10.4%
4
Genetics Selection Evolution
33 papers in training set
Top 0.1%
10.1%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 23%
7.2%
6
PLOS Genetics
756 papers in training set
Top 4%
3.6%
7
Animals
20 papers in training set
Top 0.2%
3.1%
8
The Plant Genome
53 papers in training set
Top 0.3%
2.4%
9
Communications Biology
886 papers in training set
Top 7%
1.8%
10
Gigabyte
60 papers in training set
Top 0.7%
1.7%
11
Nature Communications
4913 papers in training set
Top 57%
1.2%
12
Genetics
225 papers in training set
Top 3%
0.9%
13
Molecular Ecology Resources
161 papers in training set
Top 0.9%
0.9%
14
Genes
126 papers in training set
Top 2%
0.9%
15
Genome Research
409 papers in training set
Top 4%
0.9%
16
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
17
Frontiers in Veterinary Science
30 papers in training set
Top 0.9%
0.7%
18
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.7%
19
Journal of Dairy Science
11 papers in training set
Top 0.1%
0.7%
20
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
21
Genomics
60 papers in training set
Top 3%
0.7%
22
Genome Medicine
154 papers in training set
Top 9%
0.7%
23
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
24
Molecular Ecology
304 papers in training set
Top 4%
0.7%
25
Bioinformatics
1061 papers in training set
Top 10%
0.6%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
27
Genome Biology
555 papers in training set
Top 9%
0.6%