Disagreement between demultiplexing methods reveals structured cell quality gradients in multiplexed single-cell data
Sen, E.; Steiger, S.; Basic, M.; Prokoph, N.; Syed, A. P.; Seufert, I.; Rehman, U.-U.; Schumacher, S.; Baumann, A.; Feuring, M.; Weinhold, N.; Lübbert, M.; Döhner, H.; Döhner, K.; Raab, M. S.; Mallm, J.-P.; Stegle, O.; Rippe, K.
Show abstract
BackgroundSingle-cell multi-omics profiling of hematopoietic malignancies frequently involves pooling of patient samples before library preparation to reduce costs. Demultiplexing and quality control of the resulting sequencing data depend on experimental design, sequencing depth, and computational methods. Existing approaches benchmark individual tools, auto-select a single best method, or apply majority voting. However, none systematically exploit disagreement patterns among orthogonal strategies as a diagnostic signal for cell quality. ResultsWe introduce Split-flow, a modular Nextflow pipeline that runs hashing-based and SNP-based demultiplexing, and transcriptome-based doublet detection in parallel. It classifies cells into quality strata through a concordance-based decision framework. Validation on multiplexed CITE-seq data from 14 multiple myeloma patients across eight Chromium channels demonstrates high reproducibility and shows that discordant cells cluster within specific cell types and quality strata. TCR clonotype cross-referencing against VDJdb confirms that concordance-based classification enriches for biologically genuine immune receptor sequences, with a 5.3-fold enrichment of confirmed public TCR sequences in the high-confidence stratum. Downsampling analysis reveals that SNP-based methods are more depth-sensitive than hash-based approaches, supporting the recommendation to combine both strategies. The framework transfers to AML samples across three assay types (snMultiome-seq, scRNA-seq, scATAC-seq), where ATAC-based demultiplexing resolves donor assignment discordance under low hashing efficiency. ConclusionsSplit-flow demonstrates that combining of orthogonal preprocessing methods yields structured information about cell quality and offers a concordance-based framework that transforms this disagreement into a diagnostic signal. It introduces a preprocessing approach that can be exploited beyond hematopoietic malignancies in multiplexed single-cell applications. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=114 SRC="FIGDIR/small/724135v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@1f36dbcorg.highwire.dtl.DTLVardef@a9799forg.highwire.dtl.DTLVardef@6fca94org.highwire.dtl.DTLVardef@15cc1f3_HPS_FORMAT_FIGEXP M_FIG C_FIG Highlights and main findingsO_LIIntroduces Split-flow, a modular Nextflow DSL2 pipeline for preprocessing of multiplexed single-cell multi-omics sequencing data from hematopoietic malignancy samples via a post hoc concordance-based decision framework. C_LIO_LIProvides practical guidance for the experimental design of multiplexed single-cell multi-omics experiments, including the recommendation to combine antibody-based hashing with a SNP genotype reference for orthogonal demultiplexing. C_LIO_LIReveals that SNP-based demultiplexing is more sensitive to sequencing depth than hash-based approaches, and that the combined strategy mitigates depth-dependent biases in cell-type recovery. C_LIO_LIDemonstrates that disagreement between demultiplexing methods contains structured diagnostic information about cell quality, with concordance categories reflecting genuine quality gradients in multiple myeloma CITE-seq samples. C_LIO_LIValidates the concordance framework using T cell receptor sequences as an orthogonal biological readout, with a 5.3-fold enrichment of confirmed public TCR sequences in the high-confidence stratum. C_LIO_LIApplies the preprocessing framework to AML patient samples across three assay types (snMultiome-seq, scRNA-seq, and scATAC-seq) and demonstrates that ATAC-based demultiplexing can resolve donor-assignment discordance. C_LI
Matching journals
The top 5 journals account for 50% of the predicted probability mass.