Back

SPrUCE: Utilizing Ultraconserved Elements of DNA for Population-Level Genetic Diversity Estimation

Melendez, D.; Sapci, A. O. B.; Bafna, V.; Mirarab, S.

2025-11-16 genomics
10.1101/2025.11.14.688492 bioRxiv
Show abstract

Ultraconserved elements (UCEs) provide ideal candidates for targeted sequencing and cost-effective acquisition of genome-wide data. While UCEs have been widely used in phylogenetic studies to recon-struct evolutionary relationships, their use in population-level research has been limited. This limited application stems from uncertainty over whether UCEs can capture the levels of genetic variation needed to answer population genomic questions central to ecology and biodiversity research. The concern is that, by definition, UCEs are highly conserved and may therefore lack sufficient within-species variation. The more variable flanking regions (400-750 bp from the UCE core) contain informative polymorphisms, though diversity decreases near the core. Thus, any naive estimator of genetic diversity that ignores this conservation will have an underestimation bias. In this paper, we introduce SPrUCE: Sigmoid Pi requiring UCEs, a reference-free method that estimates nucleotide diversity{pi} from aligned UCE data. SPrUCE corrects underestimation bias by modeling the change in diversity away from the UCE core using a Gompertz function. The model accounts for the bias introduced by the conserved core and allows for more accurate per-site diversity estimates. We tested SPrUCE on UCE alignments from a range of taxa, including invertebrates and vertebrates (finches, honeybees, sheep, and smelt). SPrUCE produces diversity values consistent with whole-genome derived estimates that require an assembled reference. It is fast, scalable, and effective even with missing data. Its modeling approach enables accurate population-level assessments of genetic diversity, offering a new and reliable option for conservation and population genetics.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Molecular Ecology Resources
161 papers in training set
Top 0.1%
41.2%
2
Methods in Ecology and Evolution
160 papers in training set
Top 0.2%
19.6%
50% of probability mass above
3
Molecular Biology and Evolution
488 papers in training set
Top 0.6%
7.5%
4
Bioinformatics
1061 papers in training set
Top 5%
4.1%
5
Genome Research
409 papers in training set
Top 1%
2.7%
6
Bioinformatics Advances
184 papers in training set
Top 2%
2.2%
7
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
8
Nature Communications
4913 papers in training set
Top 55%
1.3%
9
Nature Genetics
240 papers in training set
Top 6%
1.0%
10
Genome Biology
555 papers in training set
Top 6%
0.9%
11
PLOS Genetics
756 papers in training set
Top 13%
0.8%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
13
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
14
BMC Genomics
328 papers in training set
Top 5%
0.8%
15
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
16
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.7%
17
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
18
Genetics
225 papers in training set
Top 5%
0.7%
19
GENETICS
189 papers in training set
Top 2%
0.5%
20
Systematic Biology
121 papers in training set
Top 0.5%
0.5%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.5%
22
Science
429 papers in training set
Top 22%
0.5%
23
New Phytologist
309 papers in training set
Top 5%
0.5%