Back

Enhancing Detection of Polygenic Adaptation: A Comparative Study of Machine Learning and Statistical Approaches Using Simulated Evolve-and-Resequence Data

Caliendo, C.; Gerber, S.; Pfenninger, M.

2026-02-24 genetics
10.1101/2024.11.28.625827 bioRxiv
Show abstract

Detecting signals of polygenic adaptation remains a significant challenge in population genomics, as traditional methods often struggle to identify the associated subtle, multi-locus allele-frequency shifts. Here, we introduced and tested several novel approaches combining machine learning techniques with traditional statistical tests to detect polygenic adaptation patterns in time-series of allele frequency changes from whole genome data. We implemented a Naive Bayesian Classifier (NBC) and One-Class Support Vector Machines (OCSVM), and compared their performance against the classical Fishers Exact Test (FET). Furthermore, we combined machine learning and statistical models (OCSVM-FET and NBC-FET), resulting in 5 competing approaches. Using a simulated data set based on empirical evolve-and-resequencing Chironomus riparius genomic data, we evaluated methods across evolutionary scenarios, varying in generations, selection strength and numbers of loci under selection. Our results demonstrate that the combined OCSVM-FET approach consistently outperformed competing methods, achieving the lowest false positive rate, highest area under the curve, and high accuracy. The performance peak aligned with what we term the late dynamic phase of adaptation--the period after initial selection has occurred but before fixation--highlighting the methods sensitivity to ongoing selective processes and thus its value for experimental approaches. Furthermore, we emphasize the critical role of parameter tuning, balancing biological assumptions with methodological rigor. Our approach offers a powerful tool for detecting polygenic adaptation from time series, e.g. pool sequencing data from evolve-and-resequence experiments.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Molecular Ecology Resources
161 papers in training set
Top 0.1%
23.0%
2
BMC Genomics
328 papers in training set
Top 0.1%
10.3%
3
PLOS Genetics
756 papers in training set
Top 1%
8.6%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
7.0%
5
Frontiers in Genetics
197 papers in training set
Top 2%
3.8%
50% of probability mass above
6
Genetics Selection Evolution
33 papers in training set
Top 0.1%
3.7%
7
Bioinformatics
1061 papers in training set
Top 6%
3.3%
8
BMC Bioinformatics
383 papers in training set
Top 3%
3.1%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
2.8%
10
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.8%
2.8%
11
Molecular Ecology
304 papers in training set
Top 2%
2.1%
12
PLOS ONE
4510 papers in training set
Top 47%
2.1%
13
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
14
Molecular Biology and Evolution
488 papers in training set
Top 2%
1.7%
15
GENETICS
189 papers in training set
Top 0.6%
1.7%
16
Genome Biology and Evolution
280 papers in training set
Top 1%
1.4%
17
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.4%
18
Nucleic Acids Research
1128 papers in training set
Top 12%
1.4%
19
Nature Communications
4913 papers in training set
Top 56%
1.3%
20
PeerJ
261 papers in training set
Top 11%
1.0%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
22
Scientific Reports
3102 papers in training set
Top 70%
0.9%
23
Genetic Epidemiology
46 papers in training set
Top 0.7%
0.8%
24
Genetics
225 papers in training set
Top 4%
0.8%
25
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.8%
26
eLife
5422 papers in training set
Top 58%
0.7%
27
Peer Community Journal
254 papers in training set
Top 4%
0.7%
28
Journal of Evolutionary Biology
98 papers in training set
Top 1%
0.7%
29
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.9%
0.7%
30
Genome Research
409 papers in training set
Top 5%
0.5%