Back

Model selection in ADMIXTURE can be inconsistent: proof of the K=2 phenomenon

Do, D.; Terhorst, J.

2026-03-02 evolutionary biology
10.64898/2026.02.27.708651 bioRxiv
Show abstract

STRUCTURE and ADMIXTURE are two popular methods for detecting population structure in genetic data. They model observed genotypes as mixtures of latent ancestral populations, and the inferred admixture proportions can be used to visualize and summarize population structure. A key parameter in these models is the number of ancestral populations, K. Selecting K is a challenging problem. Perhaps the most widely used method is Evannos {Delta}K, which selects K based on the second-order change in log-likelihood as K increases. However, practitioners have often noted that {Delta}K often favors overly small K, frequently returning K = 2 even when more meaningful substructure is present. In this paper, we provide a theoretical explanation for this phenomenon: we prove that, under certain conditions, the {Delta}K method can be inconsistent, meaning that it can fail to identify the true number of populations even with infinite data.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
12.5%
2
GENETICS
189 papers in training set
Top 0.1%
12.3%
3
Genetics
225 papers in training set
Top 0.6%
6.7%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 12%
6.3%
5
Bioinformatics
1061 papers in training set
Top 5%
4.8%
6
Theoretical Population Biology
47 papers in training set
Top 0.1%
3.9%
7
Molecular Biology and Evolution
488 papers in training set
Top 1%
3.5%
8
Journal of Computational Biology
37 papers in training set
Top 0.1%
3.5%
50% of probability mass above
9
The American Journal of Human Genetics
206 papers in training set
Top 1%
3.5%
10
Nature Communications
4913 papers in training set
Top 42%
3.2%
11
Genome Research
409 papers in training set
Top 1%
2.8%
12
Nature Computational Science
50 papers in training set
Top 0.4%
2.0%
13
Science
429 papers in training set
Top 13%
1.9%
14
Cell Systems
167 papers in training set
Top 6%
1.9%
15
Systematic Biology
121 papers in training set
Top 0.3%
1.7%
16
PLOS Genetics
756 papers in training set
Top 9%
1.7%
17
Communications Biology
886 papers in training set
Top 11%
1.5%
18
Bulletin of Mathematical Biology
84 papers in training set
Top 1%
1.3%
19
eLife
5422 papers in training set
Top 48%
1.3%
20
Scientific Reports
3102 papers in training set
Top 65%
1.3%
21
Science Advances
1098 papers in training set
Top 22%
1.3%
22
PLOS ONE
4510 papers in training set
Top 59%
1.3%
23
Journal of The Royal Society Interface
189 papers in training set
Top 4%
1.1%
24
Virus Evolution
140 papers in training set
Top 1%
1.1%
25
Evolution
199 papers in training set
Top 2%
0.9%
26
Ecology Letters
121 papers in training set
Top 1%
0.8%
27
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
28
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%
29
Genome Biology
555 papers in training set
Top 9%
0.6%
30
Peer Community Journal
254 papers in training set
Top 5%
0.6%