Back

scprocess: a pipeline for processing, integrating and visualising atlas-scale single cell data

Koderman, M.; Pilarski, J.; Bianco, E.; Gonzalez, D.; Robinson, M. D.; Macnair, W.

2026-03-13 bioinformatics
10.64898/2026.03.09.710141 bioRxiv
Show abstract

MotivationThe transition toward "atlas-scale" single cell research has resulted in datasets comprising millions of cells across hundreds of samples, creating significant challenges for data management, computational efficiency, and reproducibility. While numerous methods are available for individual steps in single cell data processing, the highly complex nature of the analysis makes it challenging to maintain a clear record of every tool and parameter used. This makes final results difficult to reproduce, highlighting the need for a unified workflow that integrates multiple steps into a cohesive framework. Resultsscprocess is a Snakemake pipeline designed to streamline and automate the complex steps involved in processing single cell RNA sequencing data. Specifically optimized for data generated using the 10x Genomics technology, it provides a comprehensive solution that transforms raw sequencing files into standardized outputs suitable for a variety of downstream tasks. The pipeline is built to support the analysis of datasets comprising multiple (e.g. 100+) samples via a simple CLI, allowing researchers to efficiently explore their datasets while ensuring reproducibility and scalability in their workflows. Availability and implementationscprocess can be installed via GitHub (https://github.com/marusakod/scprocess) under the MIT license. Documentation, including setup instructions and tutorials on example datasets is available at https://marusakod.github.io/scprocess/.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.