Estimating hierarchical F-statistics from Pool-Seq data

Gautier, M.; Coronado-Zamora, M.; Vitalis, R.

2024-11-22 genetics

10.1101/2024.11.22.624688 bioRxiv

Show abstract

Introduced over seventy years ago, F -statistics have been and remain central to population and evolutionary genetics. Among them, FST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of FST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F -statistics have been introduced to account for multiple predefined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat. Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F -statistics tailored for Pool-Seq data, a cost-effective alternative to individual genome sequencing. These Pool-Seq estimators have been developed in an anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat, together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool-Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.

Estimating hierarchical F-statistics from Pool-Seq data

Matching journals