Back

Assembly of a pangenome uncovers novel non-reference unique insertion sequences in cattle highlighting their genetic diversity

Sorin, V.; Besnard, F.; Capitan, A.; Grohs, C.; Naji, M. M.; Escouflaire, C.; Fritz, S.; Lledo, J.; Eche, C.; Iampietro, C.; Donnadieu, C.; Milan, D.; Drouilhet, L.; Tosser-Klopp, G.; Boichard, D.; Klopp, C.; Sanchez, M.-P.; Boussaha, M.

2025-12-04 genetics
10.64898/2025.12.02.691810 bioRxiv
Show abstract

BackgroundThe current cattle reference genome, derived from a single Hereford cow, does not capture the full spectrum of genetic diversity present within the species. Moreover, detecting structural variations (SVs [≥] 50 nucleotides long) remains challenging using only standard approaches of either short or long-read sequence approaches against a linear reference genome. Recent advances in long-read sequencing technologies and graph-based assembly now enable the construction of breed-specific pangenomes, revealing previously uncharacterized genomic regions that may contribute to important agricultural traits. ResultsIn this study we constructed a cattle pangenome graph using 16 high-quality haplotype-resolved genome assemblies originating from nine breeds representing the diversity of French cattle populations, and including Yak (Bos grunniens) as a close outgroup species. Using a trio-based strategy combined with complementary sequencing technologies and bioinformatics methods, we identified and characterized 101,219 structural variations. Of these, 33,634 were classified as non-reference unique insertions (NRUIs), adding several megabases of novel genomic sequences absent from the current Hereford reference genome. Analysis of the distribution of these NRUIs revealed significant genome-wide enrichment within QTL regions associated with milk production and morphological traits, suggesting their contribution to the genetic basis of economically relevant phenotypes. Furthermore, their functional annotation highlighted two NRUIs located within the intronic regions of ARMH3 and EPHA5, both specific to the Normande breed and significantly associated with milk production and morphological traits, respectively. ConclusionsOur findings demonstrate the value of pangenome approaches to uncover functionally relevant SVs, particularly NRUIs, that are systematically not in the current reference genome. By linking these variants to economically important traits, our work underscores the need to incorporate breed diversity into future genomic analyses and reference-building efforts in cattle.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
23.4%
2
Genetics Selection Evolution
33 papers in training set
Top 0.1%
14.9%
3
Scientific Reports
3102 papers in training set
Top 4%
10.8%
4
Frontiers in Genetics
197 papers in training set
Top 0.4%
9.5%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 35%
4.0%
6
Communications Biology
886 papers in training set
Top 3%
2.7%
7
Nature Communications
4913 papers in training set
Top 44%
2.7%
8
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.8%
2.7%
9
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.4%
1.8%
10
Microbial Genomics
204 papers in training set
Top 1%
1.4%
11
Molecular Ecology Resources
161 papers in training set
Top 0.7%
1.4%
12
Genome Biology
555 papers in training set
Top 5%
1.3%
13
Bioinformatics
1061 papers in training set
Top 8%
1.3%
14
Gigabyte
60 papers in training set
Top 1%
1.0%
15
Genes
126 papers in training set
Top 2%
0.8%
16
Molecular Ecology
304 papers in training set
Top 4%
0.8%
17
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
18
PLOS Genetics
756 papers in training set
Top 14%
0.8%
19
Genomics
60 papers in training set
Top 2%
0.8%
20
Biology
43 papers in training set
Top 3%
0.7%
21
Scientific Data
174 papers in training set
Top 2%
0.7%
22
Genome Research
409 papers in training set
Top 4%
0.7%
23
Royal Society Open Science
193 papers in training set
Top 5%
0.7%
24
BMC Biology
248 papers in training set
Top 5%
0.7%
25
Journal of Dairy Science
11 papers in training set
Top 0.1%
0.7%
26
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.5%
27
Bioinformatics Advances
184 papers in training set
Top 5%
0.5%
28
Genome Medicine
154 papers in training set
Top 10%
0.5%