SPrUCE: Utilizing Ultraconserved Elements of DNA for Population-Level Genetic Diversity Estimation
Melendez, D.; Sapci, A. O. B.; Bafna, V.; Mirarab, S.
Show abstract
Ultraconserved elements (UCEs) provide ideal candidates for targeted sequencing and cost-effective acquisition of genome-wide data. While UCEs have been widely used in phylogenetic studies to recon-struct evolutionary relationships, their use in population-level research has been limited. This limited application stems from uncertainty over whether UCEs can capture the levels of genetic variation needed to answer population genomic questions central to ecology and biodiversity research. The concern is that, by definition, UCEs are highly conserved and may therefore lack sufficient within-species variation. The more variable flanking regions (400-750 bp from the UCE core) contain informative polymorphisms, though diversity decreases near the core. Thus, any naive estimator of genetic diversity that ignores this conservation will have an underestimation bias. In this paper, we introduce SPrUCE: Sigmoid Pi requiring UCEs, a reference-free method that estimates nucleotide diversity{pi} from aligned UCE data. SPrUCE corrects underestimation bias by modeling the change in diversity away from the UCE core using a Gompertz function. The model accounts for the bias introduced by the conserved core and allows for more accurate per-site diversity estimates. We tested SPrUCE on UCE alignments from a range of taxa, including invertebrates and vertebrates (finches, honeybees, sheep, and smelt). SPrUCE produces diversity values consistent with whole-genome derived estimates that require an assembled reference. It is fast, scalable, and effective even with missing data. Its modeling approach enables accurate population-level assessments of genetic diversity, offering a new and reliable option for conservation and population genetics.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.