Back

Disagreement between demultiplexing methods reveals structured cell quality gradients in multiplexed single-cell data

Sen, E.; Steiger, S.; Basic, M.; Prokoph, N.; Syed, A. P.; Seufert, I.; Rehman, U.-U.; Schumacher, S.; Baumann, A.; Feuring, M.; Weinhold, N.; Lübbert, M.; Döhner, H.; Döhner, K.; Raab, M. S.; Mallm, J.-P.; Stegle, O.; Rippe, K.

2026-05-13 bioinformatics
10.64898/2026.05.10.724135 bioRxiv
Show abstract

BackgroundSingle-cell multi-omics profiling of hematopoietic malignancies frequently involves pooling of patient samples before library preparation to reduce costs. Demultiplexing and quality control of the resulting sequencing data depend on experimental design, sequencing depth, and computational methods. Existing approaches benchmark individual tools, auto-select a single best method, or apply majority voting. However, none systematically exploit disagreement patterns among orthogonal strategies as a diagnostic signal for cell quality. ResultsWe introduce Split-flow, a modular Nextflow pipeline that runs hashing-based and SNP-based demultiplexing, and transcriptome-based doublet detection in parallel. It classifies cells into quality strata through a concordance-based decision framework. Validation on multiplexed CITE-seq data from 14 multiple myeloma patients across eight Chromium channels demonstrates high reproducibility and shows that discordant cells cluster within specific cell types and quality strata. TCR clonotype cross-referencing against VDJdb confirms that concordance-based classification enriches for biologically genuine immune receptor sequences, with a 5.3-fold enrichment of confirmed public TCR sequences in the high-confidence stratum. Downsampling analysis reveals that SNP-based methods are more depth-sensitive than hash-based approaches, supporting the recommendation to combine both strategies. The framework transfers to AML samples across three assay types (snMultiome-seq, scRNA-seq, scATAC-seq), where ATAC-based demultiplexing resolves donor assignment discordance under low hashing efficiency. ConclusionsSplit-flow demonstrates that combining of orthogonal preprocessing methods yields structured information about cell quality and offers a concordance-based framework that transforms this disagreement into a diagnostic signal. It introduces a preprocessing approach that can be exploited beyond hematopoietic malignancies in multiplexed single-cell applications. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=114 SRC="FIGDIR/small/724135v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@1f36dbcorg.highwire.dtl.DTLVardef@a9799forg.highwire.dtl.DTLVardef@6fca94org.highwire.dtl.DTLVardef@15cc1f3_HPS_FORMAT_FIGEXP M_FIG C_FIG Highlights and main findingsO_LIIntroduces Split-flow, a modular Nextflow DSL2 pipeline for preprocessing of multiplexed single-cell multi-omics sequencing data from hematopoietic malignancy samples via a post hoc concordance-based decision framework. C_LIO_LIProvides practical guidance for the experimental design of multiplexed single-cell multi-omics experiments, including the recommendation to combine antibody-based hashing with a SNP genotype reference for orthogonal demultiplexing. C_LIO_LIReveals that SNP-based demultiplexing is more sensitive to sequencing depth than hash-based approaches, and that the combined strategy mitigates depth-dependent biases in cell-type recovery. C_LIO_LIDemonstrates that disagreement between demultiplexing methods contains structured diagnostic information about cell quality, with concordance categories reflecting genuine quality gradients in multiple myeloma CITE-seq samples. C_LIO_LIValidates the concordance framework using T cell receptor sequences as an orthogonal biological readout, with a 5.3-fold enrichment of confirmed public TCR sequences in the high-confidence stratum. C_LIO_LIApplies the preprocessing framework to AML patient samples across three assay types (snMultiome-seq, scRNA-seq, and scATAC-seq) and demonstrates that ATAC-based demultiplexing can resolve donor-assignment discordance. C_LI

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
25.8%
2
BMC Bioinformatics
383 papers in training set
Top 1.0%
9.1%
3
Nucleic Acids Research
1128 papers in training set
Top 3%
6.3%
4
Bioinformatics Advances
184 papers in training set
Top 0.5%
6.3%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
50% of probability mass above
6
Cytometry Part A
30 papers in training set
Top 0.1%
3.6%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
8
Nature Communications
4913 papers in training set
Top 40%
3.6%
9
Cell Reports Methods
141 papers in training set
Top 1%
3.1%
10
Genome Biology
555 papers in training set
Top 3%
2.7%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.7%
12
iScience
1063 papers in training set
Top 18%
1.5%
13
Communications Biology
886 papers in training set
Top 13%
1.3%
14
ImmunoInformatics
11 papers in training set
Top 0.1%
1.3%
15
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.6%
1.3%
16
Frontiers in Immunology
586 papers in training set
Top 5%
1.2%
17
Leukemia
39 papers in training set
Top 0.6%
1.2%
18
Patterns
70 papers in training set
Top 2%
1.2%
19
Genome Medicine
154 papers in training set
Top 7%
0.9%
20
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
21
BMC Medical Genomics
36 papers in training set
Top 1%
0.8%
22
npj Precision Oncology
48 papers in training set
Top 1%
0.8%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
25
mAbs
28 papers in training set
Top 0.4%
0.7%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
27
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
28
PLOS ONE
4510 papers in training set
Top 69%
0.7%
29
Cell Systems
167 papers in training set
Top 13%
0.7%
30
eLife
5422 papers in training set
Top 62%
0.6%