Back

STEQ: A statistically consistent quartet distance based species tree estimation method

Saha, P.; Saha, A.; Roddur, M. S.; Sikdar, S.; Anik, N. H.; Reaz, R.; Bayzid, M. S.

2026-03-02 bioinformatics
10.64898/2026.02.27.708511 bioRxiv
Show abstract

Accurate estimation of large-scale species trees from multilocus data in the presence of gene tree discordance remains a major challenge in phylogenomics. Although maximum likelihood, Bayesian, and statistically consistent summary methods can infer species trees with high accuracy, most of these methods are slow and not scalable to large number of taxa and genes. One of the promising ways for enabling large-scale phylogeny estimation is distance based estimation methods. Here, we present STEQ, a new statistically consistent, fast, and accurate distance based method to estimate species trees from a collection of gene trees. We used a quartet based distance metric which is statistically consistent under the multi-species coalescent (MSC) model. The running time of STEQ scales as [O] (kn2 log n), for n taxa and k genes, which is asymptotically faster than the leading summary based methods such as ASTRAL. We evaluated the performance of STEQ in comparison with ASTRAL and wQFM-TREE - two of the most popular and accurate coalescent-based methods. Experimental findings on a collection of simulated and empirical datasets suggest that STEQ enables significantly faster inference of species trees while maintaining competitive accuracy with the best current methods. STEQ is publicly available at https://github.com/prottoysaha99/STEQ.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.6%
33.6%
2
BMC Bioinformatics
383 papers in training set
Top 0.6%
12.7%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
6.9%
50% of probability mass above
4
Systematic Biology
121 papers in training set
Top 0.1%
6.5%
5
Genome Research
409 papers in training set
Top 0.5%
4.9%
6
Journal of Computational Biology
37 papers in training set
Top 0.1%
3.7%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.8%
8
PLOS ONE
4510 papers in training set
Top 47%
2.1%
9
Molecular Biology and Evolution
488 papers in training set
Top 2%
2.1%
10
Nature Communications
4913 papers in training set
Top 50%
1.7%
11
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.7%
12
PeerJ
261 papers in training set
Top 9%
1.4%
13
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.4%
14
Communications Biology
886 papers in training set
Top 14%
1.2%
15
Scientific Reports
3102 papers in training set
Top 68%
1.1%
16
PLOS Genetics
756 papers in training set
Top 12%
1.1%
17
BMC Genomics
328 papers in training set
Top 4%
1.1%
18
Genetics
225 papers in training set
Top 3%
0.9%
19
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.8%
20
Peer Community Journal
254 papers in training set
Top 4%
0.8%
21
Nature Methods
336 papers in training set
Top 7%
0.7%
22
Nature Computational Science
50 papers in training set
Top 2%
0.7%
23
iScience
1063 papers in training set
Top 37%
0.7%
24
Genome Biology and Evolution
280 papers in training set
Top 2%
0.7%
25
Nucleic Acids Research
1128 papers in training set
Top 21%
0.5%
26
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
27
Nature Genetics
240 papers in training set
Top 9%
0.5%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%