Back

Estimating and correcting index hopping misassignments in single-cell RNA-seq data

Miao, L.; Collado, L.; Barkdull, S.; Saito, Y.; Jo, J.-H.; Han, J.; DellOrso, S.; Kelly, M. C.; Conlan, S.; Kong, H. H.; Brownell, I.

2024-10-24 genomics
10.1101/2024.10.21.619353 bioRxiv
Show abstract

BackgroundIndex hopping causes read assignment errors in data from multiplexed sequencing libraries. This issue has become more prevalent with the widespread use of high-capacity sequencers and highly multiplexed single-cell RNA sequencing (scRNA- seq). ResultsWe conducted deep, plate-based scRNA-seq on a mixed population of mouse skin cells. Analysis of transcriptomes from 1152 cells identified four distinct cell types. To estimate the error rate in sample assignment due to index hopping, we employed differential expression analysis to identify signature genes that were highly and specifically expressed in each cell type. We quantified the proportion of misassigned reads by examining the detection rates of signature genes in other cell types. Remarkably, regardless of gene expression levels, we estimated that 0.65% of reads per gene were assigned to incorrect cell across our data. To computationally compensate for index hopping, we developed a simple correction method wherein, for each gene, 0.65% of the librarys average expression level was subtracted from the expression in each cell. This correction had notable effects on transcriptome analyses, including increased cell-cell clustering distance and alterations in intermediate state assignments of cell differentiation. ConclusionsIndex hopping misassignments are measurable and can impact the experimental interpretation of sequencing results. We devised a straightforward method to estimate and correct for the index hopping rate by quantifying misassigned genes in distinct cell types within an scRNA-seq library. This approach can be applied to any barcoded, multiplexed scRNA-seq library containing cells with distinct expression profiles, allowing for correction of the expression matrix before conducting biological analysis.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
22.9%
2
BMC Bioinformatics
383 papers in training set
Top 0.8%
10.3%
3
Bioinformatics
1061 papers in training set
Top 4%
6.5%
4
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.2%
6.4%
5
Scientific Reports
3102 papers in training set
Top 22%
4.9%
50% of probability mass above
6
GigaScience
172 papers in training set
Top 0.4%
4.0%
7
PLOS ONE
4510 papers in training set
Top 38%
3.7%
8
Genome Biology
555 papers in training set
Top 2%
3.7%
9
Bioinformatics Advances
184 papers in training set
Top 2%
3.1%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
11
Cell Reports Methods
141 papers in training set
Top 2%
2.1%
12
Biology Methods and Protocols
53 papers in training set
Top 0.9%
1.7%
13
BMC Biology
248 papers in training set
Top 1%
1.7%
14
PLOS Computational Biology
1633 papers in training set
Top 17%
1.5%
15
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.4%
16
Nucleic Acids Research
1128 papers in training set
Top 13%
1.2%
17
Nature Communications
4913 papers in training set
Top 58%
1.0%
18
PeerJ
261 papers in training set
Top 11%
1.0%
19
F1000Research
79 papers in training set
Top 3%
1.0%
20
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.7%
1.0%
21
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
22
iScience
1063 papers in training set
Top 26%
0.9%
23
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
24
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
25
Life Science Alliance
263 papers in training set
Top 3%
0.5%
26
American Journal of Respiratory Cell and Molecular Biology
38 papers in training set
Top 0.9%
0.5%
27
Genome Research
409 papers in training set
Top 5%
0.5%