Back

Algebraic invariants for inferring 4-leaf semi-directed phylogenetic networks

Martin, S.; Moulton, V.; Leggett, R. M.

2023-09-14 evolutionary biology
10.1101/2023.09.11.557152 bioRxiv
Show abstract

A core goal of phylogenomics is to determine the evolutionary history of a set of species from biological sequence data. Phylogenetic networks are able to describe more complex evolutionary phenomena than phylogenetic trees but are more difficult to accurately reconstruct. Recently, there has been growing interest in developing methods to infer semi-directed phylogenetic networks. As computing such networks can be computationally intensive, one approach to building such networks is to puzzle together smaller networks. Thus, it is essential to have robust methods for inferring semi-directed phylogenetic networks on small numbers of taxa. In this paper, we investigate an algebraic method for performing phylogenetic network inference from nucleotide sequence data on 4-leaf semi-directed phylogenetic networks by analysing the distribution of leaf-pattern probabilities. On simulated data, we found that we can correctly identify with high accuracy the undirected phylogenetic network for sequences of length at least 10kbp. We found that identifying the semi-directed network is more challenging and requires sequences of length approaching 10Mbp. We are also able to use our approach to identify tree-like evolution and determine the underlying tree. Finally, we employ our method on a real dataset from Xiphophorus species and use the results to build a phylogenetic network.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Computational Biology
37 papers in training set
Top 0.1%
22.0%
2
Bioinformatics
1061 papers in training set
Top 2%
18.2%
3
PLOS Computational Biology
1633 papers in training set
Top 4%
8.2%
4
Systematic Biology
121 papers in training set
Top 0.1%
6.2%
50% of probability mass above
5
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.5%
6
PLOS ONE
4510 papers in training set
Top 43%
3.0%
7
Scientific Reports
3102 papers in training set
Top 45%
2.7%
8
BMC Bioinformatics
383 papers in training set
Top 3%
2.7%
9
BMC Ecology and Evolution
49 papers in training set
Top 0.6%
2.5%
10
Bulletin of Mathematical Biology
84 papers in training set
Top 1.0%
1.8%
11
BMC Genomics
328 papers in training set
Top 2%
1.7%
12
Genome Research
409 papers in training set
Top 2%
1.7%
13
Journal of Theoretical Biology
144 papers in training set
Top 1%
1.4%
14
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.4%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
16
Entropy
20 papers in training set
Top 0.3%
0.9%
17
PeerJ
261 papers in training set
Top 13%
0.9%
18
iScience
1063 papers in training set
Top 28%
0.9%
19
Peer Community Journal
254 papers in training set
Top 4%
0.8%
20
Journal of Molecular Evolution
21 papers in training set
Top 0.4%
0.8%
21
Ecological Informatics
29 papers in training set
Top 0.8%
0.7%
22
Genome Biology and Evolution
280 papers in training set
Top 2%
0.7%
23
Communications Biology
886 papers in training set
Top 25%
0.7%
24
Ecology and Evolution
232 papers in training set
Top 4%
0.7%
25
Nature Communications
4913 papers in training set
Top 66%
0.6%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%