Back

Alignment Free Phylogeny Construction Using Maximum Likelihood Using k-mer Counts

Rahman, A. T. M. M.; Habib, S.; Islam, M. M.; Rahman, K. M.; Rahman, A.

2023-12-07 evolutionary biology
10.1101/2023.12.05.570306 bioRxiv
Show abstract

Estimating phylogenetic trees from molecular data often involves first performing a multiple sequence alignment of the sequences and then identifying the tree that maximizes likelihood computed under a model of nucleotide substitution. However, sequence alignment is computationally challenging for long sequences, especially in the presence of genomic rearrangements. To address this, methods for constructing phylogenetic trees without aligning the sequences i.e. alignment-free methods have been proposed. They are generally fast and can be used to construct phylogenetic trees of a large number of species but they primarily estimate phylogenies by computing pairwise distances and are not based on statistical models of molecular evolution. In this paper, we introduce a model for k-mer frequency change based on a birth-death-migration process which can be used to estimate maximum likelihood phylogenies from frequencies of k-mers in genomic sequences of species in an alignment-free approach. Experiments on real and simulated data demonstrate the efficacy of the model for likelihood based alignment-free phylogeny construction.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 15%
12.6%
2
Journal of Computational Biology
37 papers in training set
Top 0.1%
10.2%
3
PLOS Computational Biology
1633 papers in training set
Top 3%
10.2%
4
Scientific Reports
3102 papers in training set
Top 10%
8.5%
5
Bioinformatics
1061 papers in training set
Top 4%
6.9%
6
BMC Bioinformatics
383 papers in training set
Top 2%
4.4%
50% of probability mass above
7
Systematic Biology
121 papers in training set
Top 0.2%
4.3%
8
BMC Ecology and Evolution
49 papers in training set
Top 0.3%
4.0%
9
PeerJ
261 papers in training set
Top 3%
3.1%
10
Journal of Theoretical Biology
144 papers in training set
Top 0.5%
2.6%
11
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
2.1%
12
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.1%
2.1%
13
BMC Genomics
328 papers in training set
Top 2%
1.7%
14
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.7%
15
Infection, Genetics and Evolution
43 papers in training set
Top 0.4%
1.7%
16
Communications Biology
886 papers in training set
Top 12%
1.3%
17
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.2%
18
Genes
126 papers in training set
Top 3%
0.8%
19
Ecology and Evolution
232 papers in training set
Top 4%
0.8%
20
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
21
PLOS Genetics
756 papers in training set
Top 16%
0.7%
22
F1000Research
79 papers in training set
Top 5%
0.7%
23
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%
24
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.6%
25
Frontiers in Ecology and Evolution
60 papers in training set
Top 4%
0.6%
26
Journal of Systematics and Evolution
11 papers in training set
Top 0.4%
0.5%
27
Nature Communications
4913 papers in training set
Top 67%
0.5%
28
Ecological Informatics
29 papers in training set
Top 1.0%
0.5%
29
Genome Biology and Evolution
280 papers in training set
Top 2%
0.5%