Improving imputation quality in Samoans through the integration of population-specific sequences into existing reference panels
Carlson, J. C.; Krishnan, M.; Liu, S.; Anderson, K. J.; Zhang, J. Z.; Yapp, T.-A. J.; Chiyka, E. A.; Dikec, D. A.; Cheng, H.; Naseri, T.; Reupena, M. S.; Viali, S.; Deka, R.; Hawley, N. L.; McGarvey, S. T.; Weeks, D. E.; Minster, R. L.
Show abstract
Genotype imputation is fundamental to association studies, and yet even gold standard panels like TOPMed are limited in the populations for which they yield good imputation. Specifically, Pacific Islanders are poorly represented in extant panels. To address this, we constructed an imputation reference panel using 1,285 Samoan individuals with whole-genome sequencing, combined with 1000 Genomes Project (1KGP) individuals, to create a reference panel that better represents Pacific Islander, specifically Samoan, genetic variation. We compared this panel to 1KGP and TOPMed-R3 panels based on imputed variants using genotyping array data for 1,834 Samoan participants who were not part of the panels. The 1KGP + 1285 Samoan panel yielded up to two times more well-imputed (r2 [≥] 0.80) variants than TOPMed-R3 and 1KGP and was enriched for moderate and high impact variants. There was improved imputation accuracy across the minor allele frequency (MAF) spectrum, although it was most pronounced for variants with 0.01 [≤] MAF [≤] 0.05. Imputation accuracy (r2) was greater for population-specific variants (high fixation index, FST) and those from larger haplotypes (high LD score). However, the gain in imputation accuracy over TOPMed-R3 was largest for small haplotypes (low LD score), reflecting the Samoan panels ability to capture population-specific variation not well tagged by other panels. We also augmented the 1KGP reference panel with varying numbers of Samoan participants and found that panels with 24 Samoans yielded similar performance to TOPMed-R3, and panels with 48 or more Samoans included outperformed TOPMed-R3 for all variants with MAF [≥] 0.001. Meta imputation of the TOPMed-R3 and 1285 Samoan panels yielded poorer performance than the Samoan only panel. We also demonstrated that the phasing of the reference panel impacts the imputation of population-specific variants when the reference panel is composed of individuals from an isolated population and not combined with ancestrally diverse haplotypes. This study identifies variants with improved imputation using population-specific reference panels and provides a framework for constructing other population-specific reference panels.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.