Back

Summary statistics and approximate bayesian computation are comparable to convolutional neural networks for inferring times to fixation

Roberts, M.; Josephs, E. B.

2026-02-18 evolutionary biology
10.64898/2026.02.17.706432 bioRxiv
Show abstract

Detecting signatures of positive selection in genomes is a common application of population genetics and one of the most influential models for this task is the hard selective sweep where a de novo mutation rapidly fixes. Many statistics have been developed to detect hard sweeps, often attempting to summarize signatures left behind in the site frequency, spectrum, linkage disequilibrium, and haplotype frequency. However, potentially undiscovered signals could still exist. We attempted to test whether any undiscovered signatures of hard sweeps exist by comparing machine learning models, which can learn signatures from raw data without any prior knowledge, to known summary statistics for inferring the time to fixation (tf) of a hard sweep in a background of variable sweep ages (ta). Across approximately 200,000 simulations encompassing 5 different demographic scenarios of single panmictic populations, machine learning models trained directly on raw genotype data failed to better predict tf than methods based purely on common summary statistics. This suggests few undiscovered signals remain in single timepoint, single population genotype data that can better disentangle tf and ta of hard sweeps.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS Genetics
756 papers in training set
Top 0.1%
25.5%
2
Molecular Biology and Evolution
488 papers in training set
Top 0.4%
9.9%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 9%
7.1%
4
eLife
5422 papers in training set
Top 15%
6.2%
5
Genetics
225 papers in training set
Top 0.9%
4.8%
50% of probability mass above
6
PLOS Computational Biology
1633 papers in training set
Top 7%
4.8%
7
GENETICS
189 papers in training set
Top 0.1%
4.8%
8
The American Journal of Human Genetics
206 papers in training set
Top 1%
4.3%
9
Science
429 papers in training set
Top 7%
4.2%
10
Nature Communications
4913 papers in training set
Top 43%
2.8%
11
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 2%
2.1%
12
Evolution
199 papers in training set
Top 1%
1.7%
13
Evolution Letters
71 papers in training set
Top 1%
1.7%
14
Science Advances
1098 papers in training set
Top 19%
1.6%
15
Nature Ecology & Evolution
113 papers in training set
Top 3%
1.3%
16
Genome Biology
555 papers in training set
Top 6%
1.2%
17
Genome Biology and Evolution
280 papers in training set
Top 1%
1.1%
18
Molecular Ecology
304 papers in training set
Top 4%
0.9%
19
Peer Community Journal
254 papers in training set
Top 4%
0.8%
20
New Phytologist
309 papers in training set
Top 5%
0.7%
21
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.7%
22
Genome Research
409 papers in training set
Top 5%
0.7%
23
Scientific Reports
3102 papers in training set
Top 79%
0.6%
24
Frontiers in Genetics
197 papers in training set
Top 12%
0.6%
25
PLOS Biology
408 papers in training set
Top 23%
0.6%