Back

Learning polygenic scores for human blood cell traits

Xu, Y.; Vuckovic, D.; Ritchie, S. C.; Akbari, P.; Jiang, T.; Grealey, J.; Butterworth, A. S.; Ouwehand, W. H.; Roberts, D. J.; Angelantonio, E. D.; Danesh, J.; Soranzo, N.; Inouye, M.

2020-02-18 genetics
10.1101/2020.02.17.952788 bioRxiv
Show abstract

Polygenic scores (PGSs) for blood cell traits can be constructed using summary statistics from genome-wide association studies. As the selection of variants and the modelling of their interactions in PGSs may be limited by univariate analysis, therefore, such a conventional method may yield sub-optional performance. This study evaluated the relative effectiveness of four machine learning and deep learning methods, as well as a univariate method, in the construction of PGSs for 26 blood cell traits, using data from UK Biobank (n=~400,000) and INTERVAL (n=~40,000). Our results showed that learning methods can improve PGSs construction for nearly every blood cell trait considered, with this superiority explained by the ability of machine learning methods to capture interactions among variants. This study also demonstrated that populations can be well stratified by the PGSs of these blood cell traits, even for traits that exhibit large differences between ages and sexes, suggesting potential for disease prevention. As our study found genetic correlations between the PGSs for blood cell traits and PGSs for several common human diseases (recapitulating well-known associations between the blood cell traits themselves and certain diseases), it suggests that blood cell traits may be indicators or/and mediators for a variety of common disorders via shared genetic variants and functional pathways.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.1%
38.6%
2
Scientific Reports
3102 papers in training set
Top 5%
10.7%
3
Genetic Epidemiology
46 papers in training set
Top 0.1%
7.3%
50% of probability mass above
4
Human Molecular Genetics
130 papers in training set
Top 0.5%
4.3%
5
PLOS ONE
4510 papers in training set
Top 37%
3.8%
6
Human Genetics
25 papers in training set
Top 0.1%
3.1%
7
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
2.8%
8
European Journal of Human Genetics
49 papers in training set
Top 0.4%
2.1%
9
PLOS Genetics
756 papers in training set
Top 7%
1.9%
10
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
11
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
12
Communications Biology
886 papers in training set
Top 8%
1.7%
13
BMC Medical Genomics
36 papers in training set
Top 0.5%
1.5%
14
BMC Genomics
328 papers in training set
Top 3%
1.5%
15
Cell Genomics
162 papers in training set
Top 6%
0.8%
16
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
17
BioData Mining
15 papers in training set
Top 0.8%
0.8%
18
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
19
Bioinformatics
1061 papers in training set
Top 10%
0.7%
20
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.5%
21
International Journal of Epidemiology
74 papers in training set
Top 3%
0.5%
22
Nature Communications
4913 papers in training set
Top 67%
0.5%
23
Biology
43 papers in training set
Top 4%
0.5%