Integrating Structural Variants into Sequence-Based GWAS Using a Pangenome and Imputation Framework in French Dairy Cattle
NAJI, M.; Sorin, V.; Grohs, C.; Fritz, S.; Klopp, C.; Faraut, T.; Boichard, D.; Sanchez, M.-P.; Boussaha, M.
Show abstract
Structural variants (SVs) are most effectively identified using long-read (LR) sequencing. However, long-read (LR) data remain limited, and sequenced samples often lack associated phenotypic information. To overcome this limitation, we combined pangenome-based (variation graph) and imputation approaches to enable large-scale SV association studies in the three main French dairy cattle breeds. A variation graph was constructed using 69,892 deletions, 89,900 insertions, and 17,402 duplications detected in 176 LR samples. We subsequently genotyped 939 samples for each SV in the panel by realigning their short read (SR) sequences to the graph. Validation analyses showed high genotype concordance rates for deletions (0.79) and insertions (0.79); however, the rates for duplications were low (0.14), leading to their exclusion from this study. Retained SVs were combined with single nucleotide variants (SNVs) and served as sequence-level imputation reference panel. From the SNP genotyping array data, we imputed SVs and SNVs for 11,902 Holstein, 3,753 Montbeliarde, and 3,053 Normande bulls. After quality control, more than 14 million SNVs and 40 thousand SVs were retained for within-breed genome-wide association analyses (GWAS) with daughter yield deviations for 13 traits related to milk production, udder health, fertility, and stature. The results of the GWAS demonstrated genetic architectures aligning with earlier discoveries and uncovered thirty-six unique significant associations between structural variants and traits. Conditional analysis revealed that ten of these SVs were the primary variants in the quantitative trait loci related to fat content, protein content, and stature.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.