Back

Quantifying factors that affect polygenic risk score performance across diverse ancestries and age groups for body mass index

Hui, D.; Xiao, B.; Dikilitas, O.; Freimuth, R. R.; Irvin, M. R.; Jarvik, G. P.; Kottyan, L.; Kullo, I.; Limdi, N. A.; Liu, C.; Luo, Y.; Namjou, B.; Puckelwartz, M. J.; Schaid, D.; Tiwari, H.; Wei, W.-Q.; Verma, S. S.; Kim, D.; Ritchie, M. D.

2022-05-28 genetic and genomic medicine
10.1101/2022.05.27.22275647 medRxiv
Show abstract

Polygenic risk scores (PRS) have led to enthusiasm for precision medicine. However, it is well documented that PRS do not generalize across groups differing in ancestry or sample characteristics e.g., age. Quantifying performance of PRS across different groups of study participants, using genome-wide association study (GWAS) summary statistics from multiple ancestry groups and sample sizes, and using different linkage disequilibrium (LD) reference panels may clarify factors limiting PRS transferability. To evaluate these factors in the PRS generation process, we generated body mass index (BMI) PRS (PRSBMI) in the Electronic Medical Records and Genomics network (N=75,661). Analyses were conducted in two ancestry groups (European and African) and three age ranges (adult, teenagers, and children). For PRSBMI calculations, we evaluated five LD reference panels and three GWAS summary statistics of varying sample size and ancestry. PRSBMI performance increased for both African and European ancestry individuals using cross-ancestry GWAS summary statistics compared to European-only summary statistics (6.3% and 3.7% relative R2 increase, respectively, pAfrican=0.038, pEuropean=6.26x10-4). The effects of LD reference panels were more pronounced in African ancestry study datasets. PRSBMI performance degraded in children; R2 was less than half of teenagers or adults. The effect of GWAS summary statistics sample size was small when modeled with the other factors. We also explored clinical comorbidities associated with the PRSBMI and identified associations with type 2 diabetes and coronary atherosclerosis. This study quantifies effects that ancestry, GWAS summary statistic sample size, and LD reference panel have on PRS performance, especially in cross-ancestry and age-specific analyses.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Genomics
162 papers in training set
Top 0.1%
17.2%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.3%
14.5%
3
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
9.9%
4
Human Molecular Genetics
130 papers in training set
Top 0.3%
6.7%
5
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 0.3%
6.2%
50% of probability mass above
6
Frontiers in Genetics
197 papers in training set
Top 2%
4.1%
7
eLife
5422 papers in training set
Top 23%
3.9%
8
Genome Medicine
154 papers in training set
Top 2%
3.6%
9
Genetic Epidemiology
46 papers in training set
Top 0.2%
3.5%
10
Scientific Reports
3102 papers in training set
Top 46%
2.6%
11
International Journal of Epidemiology
74 papers in training set
Top 1%
1.8%
12
Human Genetics
25 papers in training set
Top 0.2%
1.7%
13
BMC Medical Genomics
36 papers in training set
Top 0.6%
1.5%
14
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
15
PLOS Genetics
756 papers in training set
Top 11%
1.3%
16
Nature Communications
4913 papers in training set
Top 57%
1.2%
17
Human Genomics
21 papers in training set
Top 0.2%
1.1%
18
GENETICS
189 papers in training set
Top 1%
0.9%
19
The American Journal of Clinical Nutrition
19 papers in training set
Top 0.3%
0.8%
20
Nature Human Behaviour
85 papers in training set
Top 4%
0.8%
21
Bioinformatics
1061 papers in training set
Top 10%
0.7%
22
Communications Biology
886 papers in training set
Top 27%
0.7%
23
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%
24
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.6%