Back

AmpliPhy improves gene trees by adding homologs without affecting alignments

Kim, D.; Gil, M.; Katoh, K.; Dessimoz, C.

2026-01-27 evolutionary biology
10.64898/2026.01.26.701724 bioRxiv
Show abstract

In phylogenomics, gene tree reconstruction depends on multiple sequence alignment (MSA) and tree inference, and ongoing work continues to improve inference quality. Denser taxon sampling has been associated with improved gene tree inference, suggesting that adding homologs could be a practical route to higher accuracy as sequence databases continue to expand. However, adding sequences can influence multiple steps of typical inference pipelines, and little is known on its specific effect on the multiple sequence alignment, tree reconstruction, and rooting steps. We performed a large-scale empirical benchmark to quantify how homolog enrichment affects alignment and phylogenetic inference. Using an enrichment-impoverishment design and a measure of tree accuracy based on taxonomic congruence, we found that enrichment consistently improves tree inference quality, while effects on alignment quality are marginal. We show that this improvement is associated with accurate root placement on enriched trees when sensitive homolog search is accompanied. Notably, much of the benefit can be retained with relatively compact alignments produced by sequence addition. Building on these observations, we provide a tool, AmpliPhy, which efficiently improves phylogenetic reconstruction of protein families through homolog enrichment. The AmpliPhy open-source pipeline software is available at https://github.com/DessimozLab/ampliphy.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Molecular Biology and Evolution
488 papers in training set
Top 0.2%
16.9%
2
Systematic Biology
121 papers in training set
Top 0.1%
10.1%
3
Bioinformatics
1061 papers in training set
Top 3%
9.8%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 8%
8.1%
5
PLOS Computational Biology
1633 papers in training set
Top 6%
6.2%
50% of probability mass above
6
Nature Communications
4913 papers in training set
Top 31%
6.1%
7
Methods in Ecology and Evolution
160 papers in training set
Top 0.8%
3.7%
8
PLOS Biology
408 papers in training set
Top 5%
3.0%
9
eLife
5422 papers in training set
Top 30%
3.0%
10
Science
429 papers in training set
Top 10%
2.8%
11
Genome Research
409 papers in training set
Top 2%
2.0%
12
Bioinformatics Advances
184 papers in training set
Top 3%
1.4%
13
Nature Computational Science
50 papers in training set
Top 0.8%
1.4%
14
Cell Systems
167 papers in training set
Top 9%
1.3%
15
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.3%
16
Peer Community Journal
254 papers in training set
Top 3%
1.2%
17
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
18
Genome Biology
555 papers in training set
Top 7%
0.9%
19
Genetics
225 papers in training set
Top 4%
0.9%
20
Molecular Ecology Resources
161 papers in training set
Top 0.9%
0.9%
21
Scientific Reports
3102 papers in training set
Top 74%
0.8%
22
Nature Plants
84 papers in training set
Top 2%
0.8%
23
BMC Genomics
328 papers in training set
Top 5%
0.8%
24
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
25
Virus Evolution
140 papers in training set
Top 1%
0.8%
26
Journal of Computational Biology
37 papers in training set
Top 0.6%
0.7%
27
Protein Science
221 papers in training set
Top 2%
0.7%
28
PeerJ
261 papers in training set
Top 17%
0.7%
29
Microbiome
139 papers in training set
Top 4%
0.6%