Supporting per-locus substitution rates improves the accuracy of species networks and avoids spurious reticulations
Cao, Z.; Ogilvie, H.; Nakhleh, L.
Show abstract
The development of statistical methods to infer species phylogenies with reticulation (species networks) has led to many discoveries of gene flow between distinct species. However, because the dimensionality of species networks is not fixed, these methods may compensate for kinds of model misspecification, such as assuming a single substitution rate for all genomic loci, by increasing the number of dimensions beyond the true value. The popular full Bayesian species network method MCMC_SEQ has previously made this assumption, so we have added support for the proven Dirichlet model for per-locus rates to enhance its accuracy and avoid spurious results. We studied the effects of this model using simulation and an empirical dataset from Heliconius butterflies. We found that assuming a single substitution rate applies to all loci leads to the inference of spurious reticulation in simulated and empirical datasets when a full Bayesian method is used, however, the summary method InferNetwork_ML is robust to per-locus variation in substitution rates when set to ignore gene tree branch lengths. Our implementation of the model resolves this misspecification and successfully converges to the true species networks. It also infers far more accurate gene trees than assuming a single rate, or independent inference of gene trees. Our implementation of the Dirichlet per-locus rates model is now available in PhyloNet, a software package for phylogenetic inference, open source on GitHub https://github.com/NakhlehLab/PhyloNet.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.