STEQ: A statistically consistent quartet distance based species tree estimation method
Saha, P.; Saha, A.; Roddur, M. S.; Sikdar, S.; Anik, N. H.; Reaz, R.; Bayzid, M. S.
Show abstract
Accurate estimation of large-scale species trees from multilocus data in the presence of gene tree discordance remains a major challenge in phylogenomics. Although maximum likelihood, Bayesian, and statistically consistent summary methods can infer species trees with high accuracy, most of these methods are slow and not scalable to large number of taxa and genes. One of the promising ways for enabling large-scale phylogeny estimation is distance based estimation methods. Here, we present STEQ, a new statistically consistent, fast, and accurate distance based method to estimate species trees from a collection of gene trees. We used a quartet based distance metric which is statistically consistent under the multi-species coalescent (MSC) model. The running time of STEQ scales as [O] (kn2 log n), for n taxa and k genes, which is asymptotically faster than the leading summary based methods such as ASTRAL. We evaluated the performance of STEQ in comparison with ASTRAL and wQFM-TREE - two of the most popular and accurate coalescent-based methods. Experimental findings on a collection of simulated and empirical datasets suggest that STEQ enables significantly faster inference of species trees while maintaining competitive accuracy with the best current methods. STEQ is publicly available at https://github.com/prottoysaha99/STEQ.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.