Back

Extrinsic biological stochasticity and technical noise normalization of single-cell RNA sequencing data

Fang, M.; Pachter, L.

2025-05-12 bioinformatics
10.1101/2025.05.11.653373 bioRxiv
Show abstract

The technical noise introduced during single-cell RNA sequencing (scRNA-seq) has led to the use of size factor normalization as a first step prior to data analysis. However, this scaling approach inherently affects extrinsic (between cell) variability of gene expression, which stems from both biological and technical factors. Based on previous models on biological and technical extrinsic noise, we propose a general extrinsic noise model for scRNA-seq to provide a theoretical basis for size factor normalization, thus providing a framework for estimating both biological and technical components of extrinsic noise. We highlight the relationship between normalized gene expression covariance, extrinsic noise, and overdispersion, showing that extrinsic noise explains the baseline overdispersion commonly observed in scRNA-seq data. We validated the technical model by testing the relationship on data from pooled RNA. Interestingly, our model accurately describes mature mRNA counts but not nascent mRNA counts, suggesting the need for an alternative technical model for data derived from nascent transcripts. Using single-cell RNA-seq data, we characterize both biological and technical extrinsic noise and cell size factors estimated using Poisson-like genes. Overall, our model helps clarify common misconceptions and provides insight into the role of extrinsic noise and size factor normalization in scRNA-seq data.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 1%
18.4%
2
Bioinformatics
1061 papers in training set
Top 2%
12.4%
3
BMC Bioinformatics
383 papers in training set
Top 1%
6.7%
4
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
4.3%
5
Biometrics
22 papers in training set
Top 0.1%
3.6%
6
Genetics
225 papers in training set
Top 1%
3.6%
7
Physical Biology
43 papers in training set
Top 0.6%
3.0%
50% of probability mass above
8
Scientific Reports
3102 papers in training set
Top 44%
2.7%
9
Biostatistics
21 papers in training set
Top 0.1%
2.7%
10
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
2.6%
11
PLOS ONE
4510 papers in training set
Top 45%
2.6%
12
Nucleic Acids Research
1128 papers in training set
Top 8%
2.6%
13
Frontiers in Genetics
197 papers in training set
Top 3%
2.6%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.3%
15
Biophysical Journal
545 papers in training set
Top 2%
2.1%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
17
Nature Communications
4913 papers in training set
Top 52%
1.7%
18
Genome Research
409 papers in training set
Top 2%
1.7%
19
BMC Genomics
328 papers in training set
Top 3%
1.3%
20
iScience
1063 papers in training set
Top 25%
0.9%
21
Nature Methods
336 papers in training set
Top 6%
0.9%
22
Genetic Epidemiology
46 papers in training set
Top 0.7%
0.9%
23
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
24
Genome Biology
555 papers in training set
Top 7%
0.7%
25
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
26
Statistics in Medicine
34 papers in training set
Top 0.4%
0.7%
27
Journal of Computational Biology
37 papers in training set
Top 0.8%
0.6%
28
PLOS Genetics
756 papers in training set
Top 17%
0.6%
29
BioData Mining
15 papers in training set
Top 1%
0.6%