Back

Fast and Accurate Species Trees from Weighted Internode Distances

Liu, B.; Warnow, T.

2022-05-26 evolutionary biology
10.1101/2022.05.24.493312 bioRxiv
Show abstract

Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID, has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.7%
32.2%
2
Journal of Computational Biology
37 papers in training set
Top 0.1%
12.1%
3
Bioinformatics Advances
184 papers in training set
Top 0.5%
6.2%
50% of probability mass above
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.2%
5
PLOS Computational Biology
1633 papers in training set
Top 8%
4.2%
6
PLOS ONE
4510 papers in training set
Top 41%
3.5%
7
Genome Research
409 papers in training set
Top 1%
2.5%
8
BMC Genomics
328 papers in training set
Top 2%
1.8%
9
PeerJ
261 papers in training set
Top 7%
1.7%
10
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.7%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.6%
12
Systematic Biology
121 papers in training set
Top 0.3%
1.5%
13
BMC Ecology and Evolution
49 papers in training set
Top 1%
1.5%
14
Scientific Reports
3102 papers in training set
Top 63%
1.5%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
16
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.2%
17
Genome Biology
555 papers in training set
Top 6%
0.9%
18
iScience
1063 papers in training set
Top 25%
0.9%
19
Genome Biology and Evolution
280 papers in training set
Top 2%
0.9%
20
PLOS Genetics
756 papers in training set
Top 16%
0.7%
21
The Plant Journal
197 papers in training set
Top 3%
0.7%
22
Peer Community Journal
254 papers in training set
Top 5%
0.6%