Distribution of Gene Tree Topologies with Duplication, Loss, and Coalescence
Mishra, S.; Hahn, M. W.
Show abstract
MotivationMany methods can be used to infer the number and timing of gene duplication and loss events from gene trees. Most such reconciliation methods use a model of gene duplication that does not include the coalescent process, or that restricts it in important ways. As a result, changes to tree topologies due to coalescence will incur a cost of extra duplications and losses using these methods, events that did not actually occur. ResultsHere, we present results from the multispecies coalescent with duplication and loss (MSC-DL) model, which allows for the unrestricted interaction between duplication, loss, and coalescence. Theoretical results show that even histories with only a single duplication event can lead to many more trees than are normally considered: for a species tree with 2 tips, 9 trees are possible, while with 6 tips, more than 19 million trees are possible; adding even a single loss almost doubles the number of possible topologies. The probabilities of different topologies and their branch lengths under the MSC-DL for trees with two species are calculated exactly, and we provide an approach for calculating such probabilities on larger trees. These results have important implications for the accuracy of reconciliation methods, ortholog identification methods, and our understanding of evolutionary histories of duplication and loss. Supplementary InformationSupplementary materials are available at https://github.com/smishra677/Distribution-of-Gene-Tree-Topologies-with-Duplication-Loss-and-Coalescence.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.