Back

Rapid Estimation of SNP Heritability using Predictive Process approximation in Large scale Cohort Studies

Seal, S.; Datta, A.; Basu, S.

2021-05-14 bioinformatics
10.1101/2021.05.12.443931 bioRxiv
Show abstract

With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
14.5%
2
PLOS Genetics
756 papers in training set
Top 1%
9.0%
3
Frontiers in Genetics
197 papers in training set
Top 0.6%
7.1%
4
Statistics in Medicine
34 papers in training set
Top 0.1%
6.7%
5
BMC Bioinformatics
383 papers in training set
Top 2%
6.2%
6
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
6.2%
7
PLOS ONE
4510 papers in training set
Top 36%
3.9%
50% of probability mass above
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
9
Biometrics
22 papers in training set
Top 0.1%
3.0%
10
Biophysical Journal
545 papers in training set
Top 2%
2.7%
11
Biostatistics
21 papers in training set
Top 0.1%
2.7%
12
Journal of Computational Biology
37 papers in training set
Top 0.1%
2.1%
13
Scientific Reports
3102 papers in training set
Top 54%
1.9%
14
Genetic Epidemiology
46 papers in training set
Top 0.4%
1.9%
15
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
16
Communications Biology
886 papers in training set
Top 9%
1.7%
17
Nature Communications
4913 papers in training set
Top 52%
1.7%
18
NeuroImage
813 papers in training set
Top 4%
1.5%
19
Genetics
225 papers in training set
Top 3%
1.2%
20
BioData Mining
15 papers in training set
Top 0.6%
1.1%
21
GENETICS
189 papers in training set
Top 1%
0.9%
22
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.6%
0.8%
23
Physical Review E
95 papers in training set
Top 1%
0.7%
24
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.6%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 47%
0.6%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.6%
27
iScience
1063 papers in training set
Top 38%
0.6%
28
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.8%
0.6%