Back

tidygenclust: Clustering for Population Genetics in R

Tysall, E. E.; Hovhannisyan, A.; Carter, E. J.; Padilla-Iglesias, C.; Colucci, M.; Pozzi, A. V.; Leonardi, M.; Fatima, A.; Pelanek, O.; Stephenson, N. P.; Manica, A.

2025-07-31 bioinformatics
10.1101/2025.07.29.667403 bioRxiv
Show abstract

BackgroundPopulation structure analysis is crucial for evolutionary research and medical genomics. Clustering methods, broadly categorized as model-based (e.g. ADMIXTURE) or non-model-based (e.g. SCOPE), differ in their methodology and computational efficiency. Recently, fastmixture, a model-based approach, has improved scalability and performance, while replicate alignment tools, such as Clumppling, extend previous methods by also aligning the modes across K values. However, all the existing tools are standalone and generate numerous untracked text files, as well as offering limited plot customisability. ResultsWe introduce an R package, tidygenclust, which brings the functionalities of the original ADMIXTURE, fastmixture and Clumppling software into R, enabling a streamlined and integrated workflow. By integrating with tidypopgen, a package designed to handle large SNP datasets, these new tools maintain metadata, simplify data handling, and produce results as customisable ggplot2 objects for flexible visualisation. ConclusionsThe R package tidygenclust advances population genetic analysis by combining computational efficiency with reproducible workflows and user-friendly plotting. The source code and instructions can be accessed on https://github.com/EvolEcolGroup/tidygenclust.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.1%
28.1%
2
Bioinformatics
1061 papers in training set
Top 1%
18.9%
3
Methods in Ecology and Evolution
160 papers in training set
Top 0.4%
10.2%
50% of probability mass above
4
Bioinformatics Advances
184 papers in training set
Top 0.2%
8.5%
5
Journal of Open Source Software
22 papers in training set
Top 0.1%
4.9%
6
PLOS ONE
4510 papers in training set
Top 31%
4.9%
7
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
8
Molecular Ecology Resources
161 papers in training set
Top 0.5%
1.9%
9
GigaScience
172 papers in training set
Top 1%
1.8%
10
PeerJ
261 papers in training set
Top 8%
1.5%
11
Nucleic Acids Research
1128 papers in training set
Top 15%
1.0%
12
European Journal of Human Genetics
49 papers in training set
Top 1%
0.9%
13
BMC Genomics
328 papers in training set
Top 5%
0.8%
14
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
15
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
16
F1000Research
79 papers in training set
Top 4%
0.8%
17
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
18
Genetics
225 papers in training set
Top 4%
0.8%
19
Scientific Reports
3102 papers in training set
Top 76%
0.7%
20
Nature Communications
4913 papers in training set
Top 64%
0.7%
21
Genetic Epidemiology
46 papers in training set
Top 0.9%
0.7%
22
Forensic Science International: Genetics
24 papers in training set
Top 0.2%
0.7%
23
Genome Research
409 papers in training set
Top 5%
0.7%