Back

Automatic Discovery of Optimal Discrete Character Models

Boyko, J.

2025-12-03 evolutionary biology
10.1101/2024.11.15.623760 bioRxiv
Show abstract

Modeling discrete character evolution in a Markovian framework has become common practice in phylogenetic comparative methods. The increasing size and complexity of these models reflects a trend of analyses to include more taxa and more discrete characters. However, as complexity of the models increase, so do the number of potential model structures and number of estimable parameters, making it nearly impossible to consider all modeling options for a given dataset. To overcome this issue, I apply a combination of regularization and simulated annealing to models of discrete character evolution. This allows for the automatic searching and optimization across different model structures without user specification. I test this framework under several simulation scenarios including hidden rates and multiple discrete characters. The results indicate that regularized models significantly outperform traditional approaches, yielding a far lower variance and a nearly tenfold reduction in the overall error of parameter estimates in the most extreme scenarios. I illustrate the power of automatic model selection by revisiting the ancestral state estimation of concealed ovulation and mating systems in Old World monkeys. Using the dredge algorithm, I discover a previously unexamined model structure which has both better statistical performance and a differing ancestral state reconstruction when compared to default model sets. In general, these results highlight the dangers of an over-reliance on default model sets. The combination of automatic model selection and regularization help overcome problems of over-parameterization, and these results demonstrate that when inferences are drawn from a larger model space, they can be both more statistically robust and biologically realistic.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Systematic Biology
121 papers in training set
Top 0.1%
44.2%
2
Methods in Ecology and Evolution
160 papers in training set
Top 0.4%
9.7%
50% of probability mass above
3
Molecular Biology and Evolution
488 papers in training set
Top 0.6%
7.2%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.4%
5
BMC Ecology and Evolution
49 papers in training set
Top 0.5%
3.5%
6
Molecular Ecology Resources
161 papers in training set
Top 0.5%
2.0%
7
Peer Community Journal
254 papers in training set
Top 1%
2.0%
8
PeerJ
261 papers in training set
Top 6%
1.8%
9
PLOS ONE
4510 papers in training set
Top 52%
1.8%
10
PLOS Biology
408 papers in training set
Top 9%
1.8%
11
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.2%
1.6%
12
Bioinformatics
1061 papers in training set
Top 8%
1.4%
13
Journal of Theoretical Biology
144 papers in training set
Top 1%
1.0%
14
Evolution
199 papers in training set
Top 2%
0.9%
15
Ecology and Evolution
232 papers in training set
Top 3%
0.9%
16
PLOS Genetics
756 papers in training set
Top 14%
0.8%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.8%
18
Journal of Evolutionary Biology
98 papers in training set
Top 0.9%
0.8%
19
Communications Biology
886 papers in training set
Top 23%
0.8%
20
New Phytologist
309 papers in training set
Top 5%
0.7%
21
Journal of Computational Biology
37 papers in training set
Top 0.8%
0.5%
22
Genetics
225 papers in training set
Top 5%
0.5%
23
Royal Society Open Science
193 papers in training set
Top 6%
0.5%
24
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.5%
25
Genome Biology and Evolution
280 papers in training set
Top 2%
0.5%