Back

A Rarefaction Approach to Identify Local Introgression in a Three Population Tree

Smith, T. Q.; Szpiech, Z. A.

2026-05-16 evolutionary biology
10.64898/2026.05.13.724952 bioRxiv
Show abstract

Pattersons D statistic, also known as the ABBA-BABA statistic, is widely used to detect the presence of archaic genome-wide introgression between two non-sister taxa. Requiring only a single lineage from each of four taxa where one taxon acts as an outgroup to determine the ancestral allele, Pattersons D, counts the imbalance between the number of biallelic sites where either the second and third taxa (ABAB site) or the first and third taxa (BABA site). When there is no introgression, these counts are expected to be equal, and a discordance between counts suggests introgression from the third taxon into either the first or second. Pattersons D is limited to the detection of genome-wide introgression and exhibits a high false-positive rate when applied to smaller genomic segments. Here, we present a new method, D STatistic with Allelic Rarefaction (D*), to address these limitations. D* uses multiple lineages and does not require an outgroup to calculate the imbalance between the number of alleles found exclusively in the second and third taxa and the number of alleles found exclusively in the first and third taxa. D* employs a rarefaction technique to correct for unequal sample-size and allows multiallelic sites. We use simulations to show that D* has better precision and recall for detecting introgressed segments of DNA when compared to similar methods under a wide variety of model parameters and in the presence of technical artifacts common to ancient DNA analyses. We conclude with an analysis of Denisovan DNA introgression in modern day Papuans. Precompiled executables, the manual, and source code can be found at https://github.com/TQ-Smith/DSTAR

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Methods in Ecology and Evolution
160 papers in training set
Top 0.1%
28.0%
2
Molecular Biology and Evolution
488 papers in training set
Top 0.3%
12.5%
3
Bioinformatics
1061 papers in training set
Top 3%
8.5%
4
Molecular Ecology Resources
161 papers in training set
Top 0.3%
4.0%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 38%
3.6%
6
Genome Biology
555 papers in training set
Top 3%
3.3%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
8
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
9
Genetics
225 papers in training set
Top 2%
2.1%
10
BMC Genomics
328 papers in training set
Top 2%
2.1%
11
Systematic Biology
121 papers in training set
Top 0.2%
1.9%
12
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
13
eLife
5422 papers in training set
Top 39%
1.8%
14
Genome Biology and Evolution
280 papers in training set
Top 1.0%
1.7%
15
Scientific Reports
3102 papers in training set
Top 57%
1.7%
16
BMC Ecology and Evolution
49 papers in training set
Top 1%
1.3%
17
Genome Research
409 papers in training set
Top 3%
1.2%
18
PLOS Genetics
756 papers in training set
Top 11%
1.2%
19
Nature Communications
4913 papers in training set
Top 58%
1.1%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
1.0%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
22
Journal of Computational Biology
37 papers in training set
Top 0.5%
0.8%
23
Peer Community Journal
254 papers in training set
Top 4%
0.8%
24
PeerJ
261 papers in training set
Top 14%
0.8%
25
GENETICS
189 papers in training set
Top 1%
0.7%
26
Virus Evolution
140 papers in training set
Top 1%
0.7%
27
Science
429 papers in training set
Top 21%
0.7%
28
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
29
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%