Back

Multi-ancestry polygenic risk scores using phylogenetic regularization

Layne, E.; Zabad, S.; Li, Y.; Blanchette, M.

2024-02-17 bioinformatics
10.1101/2024.02.14.580313 bioRxiv
Show abstract

Accurately predicting phenotype using genotype across diverse ancestry groups remains a significant challenge in human genetics. Many state-of-the-art polygenic risk score models are known to have difficulty generalizing to genetic ancestries that are not well represented in their training set. To address this issue, we present a novel machine learning method for fitting genetic effect sizes across multiple ancestry groups simultaneously, while leveraging prior knowledge of the evolutionary relationships among them. We introduce DendroPRS, a machine learning model where SNP effect sizes are allowed to evolve along the branches of the phylogenetic tree capturing the relationship among populations. DendroPRS outperforms existing approaches at two important genotype-to-phenotype prediction tasks: expression QTL analysis and polygenic risk scores. We also demonstrate that our method can be useful for multiancestry modelling, both by fitting population-specific effect sizes and by more accurately accounting for covariate effects across groups. We additionally find a subset of genes where there is strong evidence that an ancestry-specific approach improves eQTL modelling.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
41.1%
2
Nature Communications
4913 papers in training set
Top 27%
6.6%
3
Bioinformatics
1061 papers in training set
Top 4%
5.0%
50% of probability mass above
4
Nature Genetics
240 papers in training set
Top 2%
4.5%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
6
Genome Biology
555 papers in training set
Top 2%
3.7%
7
PLOS Genetics
756 papers in training set
Top 4%
3.7%
8
Genome Medicine
154 papers in training set
Top 3%
3.2%
9
Cell Systems
167 papers in training set
Top 5%
2.7%
10
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
11
Cell Genomics
162 papers in training set
Top 3%
1.8%
12
Genome Research
409 papers in training set
Top 2%
1.8%
13
Frontiers in Genetics
197 papers in training set
Top 4%
1.8%
14
Genetics
225 papers in training set
Top 2%
1.8%
15
European Journal of Human Genetics
49 papers in training set
Top 0.7%
1.5%
16
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
17
Science
429 papers in training set
Top 16%
1.3%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
19
PLOS ONE
4510 papers in training set
Top 63%
0.9%
20
GENETICS
189 papers in training set
Top 1%
0.8%
21
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
22
Scientific Reports
3102 papers in training set
Top 73%
0.8%
23
eLife
5422 papers in training set
Top 63%
0.5%
24
Science Advances
1098 papers in training set
Top 34%
0.5%
25
BioData Mining
15 papers in training set
Top 1%
0.5%
26
Molecular Biology and Evolution
488 papers in training set
Top 5%
0.5%