Back

Distinguishing between histories of speciation and introgression using genomic data

Hibbins, M. S.; Hahn, M. W.

2022-09-09 evolutionary biology
10.1101/2022.09.07.506990 bioRxiv
Show abstract

Introgression creates complex, non-bifurcating relationships among species. At individual loci and across the genome, both introgression and incomplete lineage sorting interact to produce a wide range of different gene tree topologies. These processes can obscure the history of speciation among lineages, and, as a result, identifying the history of speciation vs. introgression remains a challenge. Here, we use theory and simulation to investigate how introgression can mislead multiple approaches to species tree inference. We find that arbitrarily low amounts of introgression can mislead both gene tree methods and parsimony methods if the rate of incomplete lineage sorting is sufficiently high. We also show that an alternative approach based on minimum gene tree node heights is inconsistent and depends on the rate of introgression across the genome. To distinguish between speciation and introgression, we apply supervised machine learning models to a set of features that can easily be obtained from phylogenomic datasets. We find that multiple of these models are highly accurate in classifying the species history in simulated datasets. We also show that, if the histories of speciation and introgression can be identified, PhyloNet will return highly accurate estimates of the contribution of each history to the data (i.e. edge weights). Overall, our results highlight the promise of supervised machine learning as a potentially powerful complement to phylogenetic methods in the analysis of introgression from genomic data.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Systematic Biology
121 papers in training set
Top 0.1%
33.2%
2
Molecular Biology and Evolution
488 papers in training set
Top 0.1%
22.7%
50% of probability mass above
3
Evolution
199 papers in training set
Top 0.5%
6.4%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.3%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 16%
4.2%
6
Genetics
225 papers in training set
Top 1%
3.6%
7
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.8%
8
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 4%
1.5%
9
Science
429 papers in training set
Top 16%
1.3%
10
GENETICS
189 papers in training set
Top 0.9%
1.2%
11
Genome Biology and Evolution
280 papers in training set
Top 1%
1.2%
12
eLife
5422 papers in training set
Top 50%
1.1%
13
Bioinformatics
1061 papers in training set
Top 8%
1.0%
14
Ecology Letters
121 papers in training set
Top 1%
0.9%
15
Evolution Letters
71 papers in training set
Top 2%
0.8%
16
Current Biology
596 papers in training set
Top 14%
0.7%
17
BMC Ecology and Evolution
49 papers in training set
Top 2%
0.7%
18
Journal of Evolutionary Biology
98 papers in training set
Top 1%
0.6%
19
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.6%
20
Virus Evolution
140 papers in training set
Top 2%
0.6%
21
Journal of Computational Biology
37 papers in training set
Top 0.8%
0.5%
22
Genome Research
409 papers in training set
Top 5%
0.5%
23
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.5%