Back

Signal, noise, and bias in phylogenetic inference:potential and limits to the resolution of phylogenetic trees in the phylogenomic era

Dornburg, A.; Su, Z. T.; Jin, Y.; Fisk, N.; Townsend, J. P.

2026-04-01 evolutionary biology
10.64898/2026.03.30.714540 bioRxiv
Show abstract

Phylogenomic datasets assembled to resolve the Tree of Life now routinely span thousands of loci comprising millions of characters. Yet the persistence of incongruent topologies across such datasets reveals a fundamental truth of phylogenetics: not all data are equally informative. Here we derive analytical approaches that predict the relative impacts of phylogenetic signal, stochastic noise, and systematic bias on phylogenetic inference. We show that these three components exhibit divergent scaling properties with character sampling: signal and bias accumulate linearly, while noise accumulates nonlinearly with a concave trajectory. For some phylogenetic problems, substantial amounts of phylogenetic noise may eventually be overwhelmed by signal. For other phylogenetic problems--especially those involving deep divergences, short internodes, or constrained character-state space--the slope of signal accumulation can be so shallow that even signal from genome-scale data may never practically exceed noise. Moreover, linear accumulation of phylogenetic bias can in principle continuously overwhelm accumulation of signal at a lower slope with additional characters, regardless of dataset size. Applying our theory to empirical datasets, we show that anchored hybrid enrichment and ultraconserved element loci, like any loci, can exhibit signal that is overwhelmed by noise, and that character acquisition biases in some loci can further confound inference. Given the pervasive nature of incongruence in the phylogenomic era, our work provides a theoretical foundation for understanding the limits of inference, improving experimental design, and guiding efficient and accurate resolution of the Tree of Life.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Science
429 papers in training set
Top 0.1%
26.9%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 0.4%
23.4%
50% of probability mass above
3
Molecular Biology and Evolution
488 papers in training set
Top 0.6%
7.1%
4
Systematic Biology
121 papers in training set
Top 0.2%
4.1%
5
Nature Communications
4913 papers in training set
Top 36%
4.1%
6
Nature
575 papers in training set
Top 8%
3.2%
7
Evolution
199 papers in training set
Top 1%
2.2%
8
PLOS Computational Biology
1633 papers in training set
Top 13%
2.2%
9
eLife
5422 papers in training set
Top 34%
2.2%
10
Genetics
225 papers in training set
Top 2%
2.0%
11
Science Advances
1098 papers in training set
Top 16%
1.8%
12
Neuron
282 papers in training set
Top 5%
1.8%
13
Nature Ecology & Evolution
113 papers in training set
Top 3%
1.5%
14
Cell Systems
167 papers in training set
Top 8%
1.4%
15
Ecology Letters
121 papers in training set
Top 1%
0.9%
16
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 6%
0.8%
17
Nature Plants
84 papers in training set
Top 2%
0.8%
18
Nature Genetics
240 papers in training set
Top 7%
0.8%
19
Scientific Reports
3102 papers in training set
Top 72%
0.8%
20
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.8%
21
Current Biology
596 papers in training set
Top 13%
0.8%
22
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 5%
0.8%
23
The ISME Journal
194 papers in training set
Top 2%
0.7%
24
Royal Society Open Science
193 papers in training set
Top 5%
0.7%
25
Neural Computation
36 papers in training set
Top 0.7%
0.7%
26
Journal of Biosciences
12 papers in training set
Top 0.2%
0.7%
27
PLOS Digital Health
91 papers in training set
Top 3%
0.7%