Back

Testing the effectiveness of principal components in adjusting for relatedness in genetic association studies

Yao, Y.; Ochoa, A.

2019-11-29 genetics
10.1101/858399 bioRxiv
Show abstract

Modern genetic association studies require modeling population structure and family relatedness in order to calculate correct statistics. Principal Components Analysis (PCA) is one of the most common approaches for modeling this population structure, but nowadays the Linear Mixed-Effects Model (LMM) is believed by many to be a superior model. Remarkably, previous comparisons have been limited by testing PCA without varying the number of principal components (PCs), by simulating unrealistically simple population structures, and by not always measuring both type-I error control and predictive power. In this work, we thoroughly evaluate PCA with varying number of PCs alongside LMM in various realistic scenarios, including admixture together with family structure, measuring both null p-value uniformity and the area under the precision-recall curves. We find that PCA performs as well as LMM when enough PCs are used and the sample size is large, and find a remarkable robustness to extreme number of PCs. However, we notice decreased performance for PCA relative to LMM when sample sizes are small and when there is family structure, although LMM performance is highly variable. Altogether, our work suggests that PCA is a favorable approach for association studies when sample sizes are large and no close relatives exist in the data, and a hybrid approach of LMM with PCs may be the best of both worlds.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Genetic Epidemiology
46 papers in training set
Top 0.1%
22.7%
2
Frontiers in Genetics
197 papers in training set
Top 0.1%
14.5%
3
BMC Bioinformatics
383 papers in training set
Top 1.0%
9.2%
4
PLOS Genetics
756 papers in training set
Top 2%
6.9%
50% of probability mass above
5
Bioinformatics
1061 papers in training set
Top 4%
6.4%
6
PLOS ONE
4510 papers in training set
Top 36%
4.0%
7
European Journal of Human Genetics
49 papers in training set
Top 0.3%
3.3%
8
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.8%
9
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1.0%
2.1%
10
Genetics
225 papers in training set
Top 2%
1.9%
11
Statistics in Medicine
34 papers in training set
Top 0.2%
1.7%
12
Behavior Genetics
15 papers in training set
Top 0.1%
1.7%
13
Forensic Science International: Genetics
24 papers in training set
Top 0.1%
1.7%
14
Scientific Reports
3102 papers in training set
Top 62%
1.5%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
16
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
17
Genetics Selection Evolution
33 papers in training set
Top 0.1%
1.0%
18
Human Genetics and Genomics Advances
70 papers in training set
Top 0.6%
0.9%
19
International Journal of Epidemiology
74 papers in training set
Top 2%
0.9%
20
BMC Genomics
328 papers in training set
Top 4%
0.9%
21
Human Brain Mapping
295 papers in training set
Top 4%
0.9%
22
Human Molecular Genetics
130 papers in training set
Top 4%
0.6%
23
Genes
126 papers in training set
Top 4%
0.6%
24
GENETICS
189 papers in training set
Top 2%
0.6%
25
Biology
43 papers in training set
Top 4%
0.5%
26
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.8%
0.5%