Back

Using Variable Window Sizes for Phylogenomic Analyses of Whole Genome Alignments

Ivan, J.; Lanfear, R.

2026-03-06 bioinformatics
10.64898/2026.03.04.709403 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMany phylogenomic studies used non-overlapping windows to address gene tree discordance across a set of aligned genomes. Recently, Ivan et al. (2025) proposed an information theoretic approach to choose an optimal window size given the alignment. However, this approach selects only a single fixed window size per chromosome, which is a useful first step but fails to account for variation in the size of non-recombining regions along each chromosome. Such variation is expected to occur due to the stochastic nature of recombination as well as the variation in recombination rates along chromosomes. In this study, we extend the approach of Ivan et al. (2025) to allow window sizes to vary across the chromosome, using a splitting-and-merging strategy that allows for each window to be of an arbitrary length. We showed that the new method outperformed the fixed-window approach in recovering gene tree topologies on a wide range of simulated datasets. Applying the new method on the genomes of seven Heliconius butterflies, we found that the average window sizes for the group ranged between 538-808bp, but with a very similar distribution of gene tree topologies compared to previous studies that used fixed window sizes. For the genomes of great apes, the average window sizes ranged from 4.2kb to 6.2kb, with the proportion of the major topology (i.e., grouping human and chimpanzee together) reaching approximately 80%. In conclusion, our study highlights the limitations of using a fixed window size when recombination rates vary across the chromosomes, and proposes a splitting-and-merging approach that allows for variable window sizes across whole genome alignments.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.7%
12.0%
2
Systematic Biology
121 papers in training set
Top 0.1%
9.7%
3
Bioinformatics
1061 papers in training set
Top 4%
6.9%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.6%
5
PeerJ
261 papers in training set
Top 0.8%
6.1%
6
BMC Genomics
328 papers in training set
Top 0.4%
6.1%
7
Molecular Ecology Resources
161 papers in training set
Top 0.2%
4.7%
50% of probability mass above
8
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.2%
3.5%
9
Genome Biology and Evolution
280 papers in training set
Top 0.5%
3.5%
10
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.5%
11
Molecular Biology and Evolution
488 papers in training set
Top 2%
3.5%
12
Bioinformatics Advances
184 papers in training set
Top 2%
3.0%
13
Journal of Computational Biology
37 papers in training set
Top 0.1%
2.8%
14
Frontiers in Genetics
197 papers in training set
Top 3%
2.3%
15
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
2.0%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
17
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
1.8%
18
F1000Research
79 papers in training set
Top 2%
1.6%
19
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
20
PLOS ONE
4510 papers in training set
Top 61%
1.2%
21
Scientific Reports
3102 papers in training set
Top 68%
1.2%
22
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.3%
1.1%
23
GigaScience
172 papers in training set
Top 3%
0.9%
24
Peer Community Journal
254 papers in training set
Top 3%
0.9%
25
Genome Research
409 papers in training set
Top 4%
0.7%
26
Virus Evolution
140 papers in training set
Top 2%
0.6%