Back

diempy: fast and reference-free genome polarisation

Setter, D.; Lohse, K.; Baird, S. J. E.

2026-03-10 evolutionary biology
10.64898/2026.02.18.706591 bioRxiv
Show abstract

Most ancestry-assignment methods rely on putatively pure reference panels, which are often unrealistic and bias inference. The genome polarisation algorithm diem, introduced previously, avoids reference panels by jointly inferring the polarity of common allelic states and quantifying variant diagnosticity via an expectation-maximisation procedure. Here we present diempy, an efficient python implementation of diem coupled with tools that turn polarised calls into analysis-ready outputs. diempy offers lossless VCF-to-diem BED conversion; ploidy-aware handling of individuals and chromosomes; flexible masking of sites, regions and individuals; and interactive visualisation of polarised genomes, hybrid indices, clines and ternary plots. Post-processing functions include DI thresholding, kernel smoothing, and automatic detection and run-length encoding of contiguous ancestry tracts. BED-based I/O facilitates integration with population-genomic workflows (e.g. filtering by annotation or ploidy). These features make reference-free genome polarisation with diempy practical and reproducible for studies of population structure, admixture and species barriers.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.