Enhancing Detection of Polygenic Adaptation: A Comparative Study of Machine Learning and Statistical Approaches Using Simulated Evolve-and-Resequence Data
Caliendo, C.; Gerber, S.; Pfenninger, M.
Show abstract
Detecting signals of polygenic adaptation remains a significant challenge in population genomics, as traditional methods often struggle to identify the associated subtle, multi-locus allele-frequency shifts. Here, we introduced and tested several novel approaches combining machine learning techniques with traditional statistical tests to detect polygenic adaptation patterns in time-series of allele frequency changes from whole genome data. We implemented a Naive Bayesian Classifier (NBC) and One-Class Support Vector Machines (OCSVM), and compared their performance against the classical Fishers Exact Test (FET). Furthermore, we combined machine learning and statistical models (OCSVM-FET and NBC-FET), resulting in 5 competing approaches. Using a simulated data set based on empirical evolve-and-resequencing Chironomus riparius genomic data, we evaluated methods across evolutionary scenarios, varying in generations, selection strength and numbers of loci under selection. Our results demonstrate that the combined OCSVM-FET approach consistently outperformed competing methods, achieving the lowest false positive rate, highest area under the curve, and high accuracy. The performance peak aligned with what we term the late dynamic phase of adaptation--the period after initial selection has occurred but before fixation--highlighting the methods sensitivity to ongoing selective processes and thus its value for experimental approaches. Furthermore, we emphasize the critical role of parameter tuning, balancing biological assumptions with methodological rigor. Our approach offers a powerful tool for detecting polygenic adaptation from time series, e.g. pool sequencing data from evolve-and-resequence experiments.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.