Back

The limits of Bayesian estimates of divergence times in measurably evolving populations

Ivanov, S.; Fosse, S.; dos reis, M.; Duchene, S.

2026-03-03 bioinformatics
10.64898/2026.02.28.708707 bioRxiv
Show abstract

Bayesian inference of divergence times for extant species using molecular data is an unconventional statistical problem: Divergence times and molecular rates are confounded, and only their product, the molecular branch length, is statistically identifiable. This means we must use priors on times and rates to break the identifiability problem. As a consequence, there is a lower bound in the uncertainty that can be attained under infinite data for estimates of evolutionary timescales using the molecular clock. With infinite data (i.e., an infinite number of sites and loci in the alignment) uncertainty in ages of nodes in phylogenies increases proportionally with their mean age, such that older nodes have higher uncertainty than younger nodes. On the other hand, if extinct taxa are present in the phylogeny, and if their sampling times are known (i.e., heterochronous data), then times and rates are identifiable and uncertainties of inferred times and rates go to zero with infinite data. However, in real heterochronous datasets (such as viruses and bacteria), alignments tend to be small and how much uncertainty is present and how it can be reduced as a function of data size are questions that have not been explored. This is clearly important for our understanding of the tempo and mode of microbial evolution using the molecular clock. Here we conducted extensive simulation experiments and analyses of empirical data to develop the infinite-sites theory for heterochronous data. Contrary to expectations, we find that uncertainty in ages of internal nodes scales positively with the distance to their closest tip with known age (i.e., calibration age), not their absolute age. Our results also demonstrate that estimation uncertainty decreases with calibration age more slowly in data sets with more, rather than fewer site patterns, although overall uncertainty is lower in the former. Our statistical framework establishes the minimum uncertainty that can be attained with perfect calibrations and sequence data that are effectively infinitely informative. Finally, we discuss the implications for viral sequence data sets. In a vast majority of cases viral data from outbreaks is not sufficiently informative to display infinite-sites behaviour and thus all estimates of evolutionary timescales will be associated with a degree of uncertainty that will depend on the size of the data set, its information content, and the complexity of the model. We anticipate that our framework is useful to determine such theoretical limits in empirical analyses of microbial outbreaks.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.3%
28.2%
2
Genetics
225 papers in training set
Top 0.4%
10.3%
3
Systematic Biology
121 papers in training set
Top 0.1%
6.5%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 13%
4.9%
5
Molecular Biology and Evolution
488 papers in training set
Top 1%
4.0%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 38%
3.7%
7
Bioinformatics
1061 papers in training set
Top 6%
3.1%
8
PeerJ
261 papers in training set
Top 5%
1.9%
9
Biophysical Journal
545 papers in training set
Top 2%
1.9%
10
Journal of Theoretical Biology
144 papers in training set
Top 0.8%
1.7%
11
Scientific Reports
3102 papers in training set
Top 57%
1.7%
12
mSystems
361 papers in training set
Top 5%
1.5%
13
Biometrics
22 papers in training set
Top 0.1%
1.5%
14
Physical Review E
95 papers in training set
Top 0.8%
1.5%
15
Journal of The Royal Society Interface
189 papers in training set
Top 3%
1.2%
16
Theoretical Population Biology
47 papers in training set
Top 0.1%
1.2%
17
F1000Research
79 papers in training set
Top 3%
1.1%
18
Physical Biology
43 papers in training set
Top 2%
0.9%
19
Nature Communications
4913 papers in training set
Top 59%
0.9%
20
Virus Evolution
140 papers in training set
Top 1%
0.9%
21
Biostatistics
21 papers in training set
Top 0.1%
0.8%
22
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.6%
0.8%
23
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.8%
24
eLife
5422 papers in training set
Top 57%
0.8%
25
Journal of Molecular Evolution
21 papers in training set
Top 0.4%
0.8%
26
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.7%
27
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
28
mBio
750 papers in training set
Top 12%
0.7%
29
BMC Bioinformatics
383 papers in training set
Top 8%
0.7%
30
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.5%