Robust and cost-efficient single-cell sequencing through combinatorial pooling

Gawron, J.; Cunha, L.; Borgsmueller, N.; Beerenwinkel, N.

2024-11-23 bioinformatics

10.1101/2024.11.22.624460 bioRxiv

Show abstract

Single-cell sequencing is widely used to study molecular cell-to-cell heterogeneity. Even though the cost of sequencing has dropped throughout the last decades, single-cell assays remain expensive, because they require strategies to index molecules by cells. The high costs of indexing can be mitigated by pooling samples prior to sequencing library preparation. Computational methods have been developed to leverage molecular features that are distinct between different samples to separate the pools into distinct datasets. However, since all multiplexed samples are processed in the same way, information on the origin of each demultiplexed dataset is lost. To map datasets to their sample of origin, additional information such as molecular indexing or additional genotyping is needed. Here, we propose a class of experimental designs that allows identifying the sample of origin of each demultiplexed dataset, only relying on the genetic profiles of the samples and the composition of pools. Our approach is based on splitting and pooling samples in specific combinations. We find a most cost-efficient experimental design in this class and prove its optimality. We present a dynamic programming algorithm to iteratively simplify an optimal experimental design by breaking it into several independent designs while maintaining optimality. Furthermore, we propose a subclass of experimental designs which allow robust sample identification even under partial failure of the experiment and present a provably optimal design in this subclass. We provide an implementation for automatic sample identification under these optimal combinatorial pooling strategies and demonstrate its functionality in a simulation study.

Robust and cost-efficient single-cell sequencing through combinatorial pooling

Matching journals