Back

Critical Differential Expression Assessment for Individual Bulk RNA-Seq Projects

Warden, C. D.; Wu, X.

2024-02-12 genomics
10.1101/2024.02.10.579728 bioRxiv
Show abstract

Finding the right balance of quality and quantity can be important, and it is essential that project quality does not drop below the level where important main conclusions are missed or misstated. We use knock-out and over-expression studies as a simplification to test recovery of a known causal gene in RNA-Seq cell line experiments. When single-end RNA-Seq reads are aligned with STAR and quantified with htseq-count, we found potential value in testing the use of the Generalized Linear Model (GLM) implementation of edgeR with robust dispersion estimation more frequently for either single-variate or multi-variate 2-group comparisons (with the possibility of defining criteria less stringent than |fold-change| > 1.5 and FDR < 0.05). When considering a limited number of patient sample comparisons with larger sample size, there might be some decreased variability between methods (except for DESeq1). However, at the same time, the ranking of the gene identified using immunohistochemistry (for ER/PR/HER2 in breast cancer samples from The Cancer Genome Atlas) showed as possible shift in performance compared to the cell line comparisons, potentially highlighting utility for standard statistical tests and/or limma-based analysis with larger sample sizes. If this continues to be true in additional studies and comparisons, then that could be consistent with the possibility that it may be important to allocate time for potential methods troubleshooting for genomics projects. Analysis of public data presented in this study does not consider all experimental designs, and presentation of downstream analysis is limited. So, any estimate from this simplification would be an underestimation of the true need for some methods testing for every project. Additionally, this set of independent cell line experiments has a limitation in being able to determine the frequency of missing a highly important gene if the problem is rare (such as 10% or lower). For example, if there was an assumption that only one method can be tested for "initial" analysis, then it is not completely clear to the extent that using edgeR-robust might perform better than DESeq2 in the cell line experiments. Importantly, we do not wish to cause undue concern, and we believe that it should often be possible to define a gene expression differential expression workflow that is suitable for some purposes for many samples. Nevertheless, at the same time, we provide a variety of measures that we believe emphasize the need to critically assess every individual project and maximize confidence in published results.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
32.3%
2
PeerJ
261 papers in training set
Top 0.1%
9.9%
3
BMC Bioinformatics
383 papers in training set
Top 1%
8.2%
50% of probability mass above
4
GigaScience
172 papers in training set
Top 0.2%
6.2%
5
PLOS ONE
4510 papers in training set
Top 29%
6.2%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.6%
3.9%
7
F1000Research
79 papers in training set
Top 0.5%
3.6%
8
Frontiers in Genetics
197 papers in training set
Top 3%
2.8%
9
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
10
Scientific Reports
3102 papers in training set
Top 61%
1.6%
11
Genomics
60 papers in training set
Top 1%
1.3%
12
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.3%
13
Bioinformatics
1061 papers in training set
Top 8%
1.2%
14
Genetic Epidemiology
46 papers in training set
Top 0.7%
0.9%
15
Genome Biology
555 papers in training set
Top 7%
0.9%
16
Life Science Alliance
263 papers in training set
Top 1%
0.8%
17
Genome Research
409 papers in training set
Top 4%
0.7%
18
PLOS Genetics
756 papers in training set
Top 15%
0.7%
19
Database
51 papers in training set
Top 1%
0.7%
20
Nature Communications
4913 papers in training set
Top 64%
0.7%
21
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
22
Microbial Genomics
204 papers in training set
Top 3%
0.6%
23
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%