Assembly of a pangenome uncovers novel non-reference unique insertion sequences in cattle highlighting their genetic diversity
Sorin, V.; Besnard, F.; Capitan, A.; Grohs, C.; Naji, M. M.; Escouflaire, C.; Fritz, S.; Lledo, J.; Eche, C.; Iampietro, C.; Donnadieu, C.; Milan, D.; Drouilhet, L.; Tosser-Klopp, G.; Boichard, D.; Klopp, C.; Sanchez, M.-P.; Boussaha, M.
Show abstract
BackgroundThe current cattle reference genome, derived from a single Hereford cow, does not capture the full spectrum of genetic diversity present within the species. Moreover, detecting structural variations (SVs [≥] 50 nucleotides long) remains challenging using only standard approaches of either short or long-read sequence approaches against a linear reference genome. Recent advances in long-read sequencing technologies and graph-based assembly now enable the construction of breed-specific pangenomes, revealing previously uncharacterized genomic regions that may contribute to important agricultural traits. ResultsIn this study we constructed a cattle pangenome graph using 16 high-quality haplotype-resolved genome assemblies originating from nine breeds representing the diversity of French cattle populations, and including Yak (Bos grunniens) as a close outgroup species. Using a trio-based strategy combined with complementary sequencing technologies and bioinformatics methods, we identified and characterized 101,219 structural variations. Of these, 33,634 were classified as non-reference unique insertions (NRUIs), adding several megabases of novel genomic sequences absent from the current Hereford reference genome. Analysis of the distribution of these NRUIs revealed significant genome-wide enrichment within QTL regions associated with milk production and morphological traits, suggesting their contribution to the genetic basis of economically relevant phenotypes. Furthermore, their functional annotation highlighted two NRUIs located within the intronic regions of ARMH3 and EPHA5, both specific to the Normande breed and significantly associated with milk production and morphological traits, respectively. ConclusionsOur findings demonstrate the value of pangenome approaches to uncover functionally relevant SVs, particularly NRUIs, that are systematically not in the current reference genome. By linking these variants to economically important traits, our work underscores the need to incorporate breed diversity into future genomic analyses and reference-building efforts in cattle.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.