Back

WINDEX: A hierarchical integration of site- and window-based statistics for characterizing the footprint of positive selection in genome-wide population genetic data

Snell, H.; McCallum, S.; Raghavan, D.; Singh, R.; Ramachandran, S.; Sugden, L.

2026-03-26 evolutionary biology
10.64898/2026.03.26.714384 bioRxiv
Show abstract

Adaptive mutations, or mutations that confer a fitness benefit, can leave behind distinct signals in genetic data. Computational methods have improved the localization of adaptive mutations in genetic samples using a range of statistical and machine learning classification techniques. However, these methods miss the opportunity to jointly integrate statistics at both the site and window-based level, thus failing to harness all available statistical evidence to detect selection. Our method, WINDEX, combines these different resolutions of statistics to improve the detection of adaptive mutations among hitchhiking signals. Our model simultaneously integrates emissions at different resolutions by defining site-based and window-based latent states corresponding to neutral, linked, and sweep regions, with the site-based states and transition models nested within the window-based states. Using evolutionary simulations with varying selection parameters, we validate the ability of WINDEX to classify positive selective sweeps. Using data from the 1000 Genomes Project, we show that WINDEX is able to identify regions harboring signals of selective sweeps, and provides improved localization within those regions over existing methods. In addition, using WINDEX genome-wide allows for estimation of the proportion of whole genomes that are under positive selective pressures; our estimates of between 9.7-10.5% across different populations provide support for other preliminary estimates of these quantities. Author summaryPopulation geneticists often seek evidence for positive selective sweeps, or an evolutionary event in which a beneficial allele increases in frequency over time in a population, resulting in increased fitness of the individuals that have said allele. Positive selective sweeps, however, are difficult to detect due to varying patterns of linkage disequilibrium (LD), or the nonrandom association of alleles, and detecting these signals reliably among differing LD structures remains a challenge in the field. In this work, we present WINDEX, a probabilistic framework designed to leverage signals of positive selective sweeps at both the site- and window-levels in the form of a hierarchical hidden Markov model (HHMM), to localize regions of positive selective sweeps in aligned haplotype data. We validate WINDEX in evolutionary simulations over varying positive selective sweep scenarios, showcasing the improved resolution that the HHMM structure provides. We apply WINDEX in comparative genomic scans of canonical sites of positive selection as well as whole-genome scans to demonstrate the tools power in localizing functionally-validated signals of selection and to offer insights into the proportion of the human genome currently under positive selective pressures. WINDEX is publicly available and easy to apply to many cases of human genetic data.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.