Back

Estimating hierarchical F-statistics from Pool-Seq data

Gautier, M.; Coronado-Zamora, M.; Vitalis, R.

2024-11-22 genetics
10.1101/2024.11.22.624688 bioRxiv
Show abstract

Introduced over seventy years ago, F -statistics have been and remain central to population and evolutionary genetics. Among them, FST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of FST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F -statistics have been introduced to account for multiple predefined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat. Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F -statistics tailored for Pool-Seq data, a cost-effective alternative to individual genome sequencing. These Pool-Seq estimators have been developed in an anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat, together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool-Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Molecular Ecology Resources
161 papers in training set
Top 0.1%
52.5%
50% of probability mass above
2
Genetics
225 papers in training set
Top 0.7%
6.4%
3
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
4
PLOS Genetics
756 papers in training set
Top 4%
3.6%
5
GENETICS
189 papers in training set
Top 0.3%
2.8%
6
Bioinformatics
1061 papers in training set
Top 6%
2.5%
7
Molecular Ecology
304 papers in training set
Top 2%
2.4%
8
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
9
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.9%
10
Genome Research
409 papers in training set
Top 2%
1.8%
11
Heredity
53 papers in training set
Top 0.1%
1.7%
12
Nature Communications
4913 papers in training set
Top 51%
1.7%
13
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.5%
14
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
15
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.3%
16
BMC Genomics
328 papers in training set
Top 3%
1.2%
17
PLOS ONE
4510 papers in training set
Top 66%
0.8%
18
European Journal of Human Genetics
49 papers in training set
Top 1%
0.8%
19
Genetic Epidemiology
46 papers in training set
Top 0.8%
0.8%
20
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
21
Genetics Selection Evolution
33 papers in training set
Top 0.2%
0.7%
22
Scientific Reports
3102 papers in training set
Top 76%
0.7%
23
Methods in Ecology and Evolution
160 papers in training set
Top 3%
0.5%
24
Journal of Heredity
35 papers in training set
Top 0.3%
0.5%