tidygenclust: Clustering for Population Genetics in R
Tysall, E. E.; Hovhannisyan, A.; Carter, E. J.; Padilla-Iglesias, C.; Colucci, M.; Pozzi, A. V.; Leonardi, M.; Fatima, A.; Pelanek, O.; Stephenson, N. P.; Manica, A.
Show abstract
BackgroundPopulation structure analysis is crucial for evolutionary research and medical genomics. Clustering methods, broadly categorized as model-based (e.g. ADMIXTURE) or non-model-based (e.g. SCOPE), differ in their methodology and computational efficiency. Recently, fastmixture, a model-based approach, has improved scalability and performance, while replicate alignment tools, such as Clumppling, extend previous methods by also aligning the modes across K values. However, all the existing tools are standalone and generate numerous untracked text files, as well as offering limited plot customisability. ResultsWe introduce an R package, tidygenclust, which brings the functionalities of the original ADMIXTURE, fastmixture and Clumppling software into R, enabling a streamlined and integrated workflow. By integrating with tidypopgen, a package designed to handle large SNP datasets, these new tools maintain metadata, simplify data handling, and produce results as customisable ggplot2 objects for flexible visualisation. ConclusionsThe R package tidygenclust advances population genetic analysis by combining computational efficiency with reproducible workflows and user-friendly plotting. The source code and instructions can be accessed on https://github.com/EvolEcolGroup/tidygenclust.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.