Back

Population specific reference panels are crucial for the genetic analyses of Native Hawai’ians: an example of the CREBRF locus

Lin, M.; Caberto, C.; Wan, P.; Li, Y.; Lum-Jones, A.; Tiirikainen, M.; Pooler, L.; Nakamura, B.; Sheng, X.; Porcel, J.; Lim, U.; Setiawa, V. W.; Le Marchand, L.; Wilkens, L. R.; Haiman, C. A.; Cheng, I.; Chiang, C. W. K.

2019-10-01 genetics
10.1101/789073 bioRxiv
Show abstract

Statistical imputation applied to genome-wide array data is the most cost-effective approach to complete the catalog of genetic variation in a study population. However, imputed genotypes in underrepresented populations incur greater inaccuracies due to ascertainment bias and a lack of representation among reference individuals,, further contributing to the obstacles to study these populations. Here we examined the consequences due to the lack of representation by genotyping a functionally important, Polynesian-specific variant, rs373863828, in the CREBRF gene, in a large number of self-reported Native Hawaiians (N=3,693) from the Multiethnic Cohort. We found the derived allele of rs373863828 was significantly associated with several adiposity traits with large effects (e.g. 0.214 s.d., or approximately 1.28 kg/m2, per allele, in BMI as the most significant; P = 7.5x10-5). Due to the current absence of Polynesian representation in publicly accessible reference sequences, rs373863828 or any of its proxies could not be tested through imputation using these existing resources. Moreover, the association signals at this Polynesian-specific variant could not be captured by alternative approaches, such as admixture mapping. In contrast, highly accurate imputation can be achieved even if a small number (<200) of Polynesian reference individuals were available. By constructing an internal set of Polynesian reference individuals, we were able to increase sample size for analysis up to 3,936 individuals, and improved the statistical evidence of association (e.g. p = 1.5x10-7, 3x10-6, and 1.4x10-4 for BMI, hip circumference, and T2D, respectively). Taken together, our results suggest the alarming possibility that lack of representation in reference panels would inhibit discovery of functionally important, population-specific loci such as CREBRF. Yet, they could be easily detected and prioritized with improved representation of diverse populations in sequencing studies.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.1%
14.4%
2
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
9.9%
3
Cell Genomics
162 papers in training set
Top 0.2%
9.9%
4
PLOS Genetics
756 papers in training set
Top 2%
8.2%
5
Human Molecular Genetics
130 papers in training set
Top 0.3%
6.7%
6
European Journal of Human Genetics
49 papers in training set
Top 0.1%
6.2%
50% of probability mass above
7
International Journal of Epidemiology
74 papers in training set
Top 0.4%
4.7%
8
The American Journal of Human Genetics
206 papers in training set
Top 1%
4.1%
9
eLife
5422 papers in training set
Top 27%
3.5%
10
PLOS ONE
4510 papers in training set
Top 46%
2.4%
11
Molecular Ecology
304 papers in training set
Top 2%
2.0%
12
Scientific Reports
3102 papers in training set
Top 63%
1.5%
13
Genome Medicine
154 papers in training set
Top 5%
1.5%
14
Genetics Selection Evolution
33 papers in training set
Top 0.1%
1.2%
15
Nature Communications
4913 papers in training set
Top 57%
1.2%
16
Genes
126 papers in training set
Top 2%
1.1%
17
Genetic Epidemiology
46 papers in training set
Top 0.6%
1.1%
18
International Journal of Obesity
25 papers in training set
Top 0.5%
0.9%
19
BMC Genomics
328 papers in training set
Top 5%
0.8%
20
Communications Biology
886 papers in training set
Top 22%
0.8%
21
Human Genomics
21 papers in training set
Top 0.3%
0.8%
22
Genetics
225 papers in training set
Top 4%
0.8%
23
Genome Biology and Evolution
280 papers in training set
Top 2%
0.7%
24
Gene
41 papers in training set
Top 2%
0.7%
25
Journal of Medical Genetics
28 papers in training set
Top 0.6%
0.7%
26
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.7%
27
Molecular Genetics and Genomics
11 papers in training set
Top 0.4%
0.7%
28
Genomics
60 papers in training set
Top 3%
0.7%
29
Genome Biology
555 papers in training set
Top 9%
0.6%
30
Human Mutation
29 papers in training set
Top 0.9%
0.6%