Back

Considering Zeros in Single Cell Sequencing Data Correlation Analysis

Cai, G.; Yu, X.; Xiao, F.

2023-05-14 bioinformatics
10.1101/2023.05.13.540566 bioRxiv
Show abstract

Single-cell sequencing technology has enabled correlation analysis of genomic features at the cellular level. However, high levels of noise and sparsity in single-cell sequencing data make accurate assessment of correlations challenging. This study provides a toolkit, SCSC (https://github.com/thecailab/SCSC), for the estimation of correlation coefficients in single-cell sequencing data. It comprehensively assessed four strategies (classical, non-zero, dropout-weighted, imputation) and the impact of data features in various simulated scenarios. The study found that filtering zeros significantly improves estimation accuracy, and further improvement can be achieved by considering the drop-out probability. In addition, the study also identified data features including expression level, library size, and biological variations that affect correlation estimation.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Briefings in Bioinformatics
326 papers in training set
Top 0.1%
32.5%
2
BMC Bioinformatics
383 papers in training set
Top 0.7%
12.2%
3
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.8%
8.3%
50% of probability mass above
4
BMC Genomics
328 papers in training set
Top 0.3%
6.2%
5
Bioinformatics
1061 papers in training set
Top 5%
4.8%
6
Genome Biology
555 papers in training set
Top 2%
4.8%
7
PLOS Computational Biology
1633 papers in training set
Top 9%
3.9%
8
Frontiers in Genetics
197 papers in training set
Top 4%
2.1%
9
GigaScience
172 papers in training set
Top 1.0%
2.0%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.0%
11
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
12
Scientific Reports
3102 papers in training set
Top 65%
1.3%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
14
PLOS ONE
4510 papers in training set
Top 65%
0.9%
15
Genome Research
409 papers in training set
Top 4%
0.9%
16
Nature Communications
4913 papers in training set
Top 60%
0.9%
17
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
18
Genome Medicine
154 papers in training set
Top 8%
0.7%
19
Advanced Science
249 papers in training set
Top 20%
0.7%
20
Communications Biology
886 papers in training set
Top 27%
0.7%
21
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.8%
0.6%
22
Journal of Genetics and Genomics
36 papers in training set
Top 3%
0.6%
23
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.6%