cspray: Distributed Single Cell Transcriptome Analysis
Hawkins, P. G.; Swanson, E. M.; Feichtel, M.
Show abstract
The size of individual single cell samples continues to grow with advancing technologies, as do the number of samples included in individual experiments and across organizations. This presents challenges for processing this data at scale, both in terms of computational throughput and the required size of the machines that must process this data. We present a single cell RNA processing method that is fully distributed, capable of processing arbitrarily large files, and numbers of files, without requiring per-file based compute sizing. Our method, cspray, includes data ingestion, preprocessing, highly variable gene annotation, PCA, and clustering. We also show that this processing at scale permits LLM based reference-free cluster annotation on low resolution clusters, which demonstrates these techniques can be used to build single cell data discovery platforms at scale.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.