Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion
Marone, M. P.; Chen, E.; Himmelbach, A.; Haberer, G.; Spannagl, M.; Stein, N.; Mascher, M.
Show abstract
BackgroundAs pangenomes approach saturation, identifying additional genomes that contribute novel sequence information becomes increasingly difficult. Current sample-selection strategies often rely on global diversity metrics or variant counts and do not explicitly account for the composition of an existing pangenome, a limitation that becomes increasingly relevant as pangenomes mature. Here, we present SelHap, a haplotype-based pipeline that uses whole-genome sequencing (WGS) data to prioritize accessions based on their contribution of novel haplotypes relative to a defined background, enabling targeted and iterative pangenome expansion. ResultsWe applied SelHap to the barley pangenome, using 76 assembled genomes as a background to select new accessions from a large WGS panel. Using this approach, we generated chromosome-scale genome assemblies from 19 accessions selected with SelHap and from 17 elite lines selected based on their relevance in historical barley breeding. Across multiple benchmarking scenarios, SelHap-based selection consistently resulted in a greater increase in non-redundant (single-copy) pangenome sequence, demonstrating that prioritizing haplotype novelty relative to an existing background maximizes unrepresented sequence content. ConclusionsBy transforming complex haplotype-clustering outputs into interpretable summaries and ranked candidate lists, SelHap provides a practical framework for targeted pangenome expansion. Beyond sample selection, SelHap can facilitate ancestry and germplasm comparisons across diverse panels. As WGS data become more accessible, SelHap offers a scalable and interpretable solution for extending mature pangenomes by explicitly targeting previously unrepresented sequence space.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.