Benchmark of Wide Range of Pairwise Distance Metrics for Automated Classification of Mouse Mutant Phenotypes from Flow Cytometry Data
May, M.; Hewitt, T.; Mashford, B.; Hammill, D.; Davies, A.; Andrews, T. D.
Show abstract
Precision medicine requires a comprehensive mapping of genotype to phenotype to provide patients with individually tailored treatment. However, when using flow cytometry to identify phenotypes, such as the quantity of various immune cell populations in tissue and blood used to identify autoimmune disorders, it is often unclear which cellular phenotypes are from healthy and disease individuals, especially when including the effects of population diversity, due to the high-dimensional nature of the data. To identify and segregate healthy phenotype from various disease phenotypes, we use pairwise distance metrics between each samples cell populations. By comparing distance metrics between C57BL/6 clone mice with mutations of known phenotype, we find that cosine similarity is best suited for segregating wildtype from mutant samples while respecting minute differences in already small cell populations, and that standardised Euclidean distance is best suited for machine-learning input due to its sensitivity. Both metrics outperform other tested metrics (including Aitchison, Euclidean, Manhattan, Earth-Movers Distance, and squared Euclidean). We demonstrate the utility of these different pairwise metrics through their application to a classification task of known mutant phenotypes: using an existing FACS phenotype dataset derived from X000 inbred C57BL/6 mice that harbour potentially phenotypic genetic variation introduced through ENU mutagenesis of individual pedigree-founding G0 male mice.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.