Back

FST and kinship for arbitrary population structures I: Generalized definitions

Ochoa, A.; Storey, J. D.

2019-06-05 genetics
10.1101/083915 bioRxiv
Show abstract

FST is a fundamental measure of genetic differentiation and population structure, currently defined for subdivided populations. FST in practice typically assumes independent, non-overlapping subpopulations, which all split simultaneously from their last common ancestral population so that genetic drift in each subpopulation is probabilistically independent of the other subpopulations. We introduce a generalized FST definition for arbitrary population structures, where individuals may be related in arbitrary ways, allowing for arbitrary probabilistic dependence among individuals. Our definitions are built on identity-by-descent (IBD) probabilities that relate individuals through inbreeding and kinship coefficients. We generalize FST as the mean inbreeding coefficient of the individuals local populations relative to their last common ancestral population. We show that the generalized definition agrees with Wrights original and the independent subpopulation definitions as special cases. We define a novel coancestry model based on \"individual-specific allele frequencies\" and prove that its parameters correspond to probabilistic kinship coefficients. Lastly, we extend the Pritchard-Stephens-Donnelly admixture model in the context of our coancestry model and calculate its FST. To motivate this work, we include a summary of analyses we have carried out in follow-up papers, where our new approach has been applied to simulations and global human data, showcasing the complexity of human population structure, demonstrating our success in estimating kinship and FST, and the shortcomings of existing approaches. The probabilistic framework we introduce here provides a theoretical foundation that extends FST in terms of inbreeding and kinship coefficients to arbitrary population structures, paving the way for new estimators and novel analyses.\n\nNote: This article is Part I of two-part manuscripts. We refer to these in the text as Part I and Part II, respectively.\n\nPart I: Alejandro Ochoa and John D. Storey. \"FST and kinship for arbitrary population structures I: Generalized definitions\". bioRxiv (10.1101/083915) (2019). https://doi.org/10.1101/083915. First published 2016-10-27.\n\nPart II: Alejandro Ochoa and John D. Storey. \"FST and kinship for arbitrary population structures II: Method of moments estimators\". bioRxiv (10.1101/083923) (2019). https://doi.org/10.1101/083923. First published 2016-10-27.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
GENETICS
189 papers in training set
Top 0.1%
21.9%
2
Genetics
225 papers in training set
Top 0.3%
14.0%
3
Bioinformatics
1061 papers in training set
Top 3%
9.8%
4
The American Journal of Human Genetics
206 papers in training set
Top 1.0%
4.7%
50% of probability mass above
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 15%
4.7%
6
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.1%
4.2%
7
Theoretical Population Biology
47 papers in training set
Top 0.1%
4.1%
8
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
9
PLOS Genetics
756 papers in training set
Top 5%
3.6%
10
Nature Communications
4913 papers in training set
Top 46%
2.3%
11
Molecular Biology and Evolution
488 papers in training set
Top 2%
2.0%
12
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.6%
13
Journal of Computational Biology
37 papers in training set
Top 0.3%
1.4%
14
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
15
eLife
5422 papers in training set
Top 50%
1.2%
16
Genetic Epidemiology
46 papers in training set
Top 0.6%
1.1%
17
Biometrics
22 papers in training set
Top 0.1%
1.1%
18
Nature Computational Science
50 papers in training set
Top 1%
0.9%
19
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
20
Communications Biology
886 papers in training set
Top 20%
0.9%
21
Genome Research
409 papers in training set
Top 4%
0.9%
22
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
23
Physical Biology
43 papers in training set
Top 2%
0.7%
24
Biostatistics
21 papers in training set
Top 0.1%
0.7%
25
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.7%
26
European Journal of Human Genetics
49 papers in training set
Top 2%
0.6%
27
Genome Biology and Evolution
280 papers in training set
Top 2%
0.6%
28
Scientific Reports
3102 papers in training set
Top 79%
0.6%