Back

Robust data-driven gene expression inference for RNA-seq using curated intergenic regions

Brandulas Cammarata, A.; Fonseca Costa, S. S.; Rosikiewicz, M.; Roux, J.; Wollbrett, J.; Bastian, F. B.; Robinson-Rechavi, M.

2026-05-20 genomics
10.1101/2022.03.31.486555 bioRxiv
Show abstract

RNA-Seq is a powerful technique to provide quantitative information on gene expression. While many applications focus on measuring expression levels, accurately distinguishing between actively and inactively transcribed genes is equally important for understanding gene function, development, and disease mechanisms. However, setting a biologically meaningful threshold for calling genes expressed is challenging due to variability in noise levels across different protocols, experiments or biological samples. We propose to define this threshold per sample relative to the background level observed in inactive genomic features, inferred by the amount of reads mapped to intergenic regions of the genome, and to call genes expressed if their level of expression is significantly higher than the estimated background noise. This approach can be applied to a single RNA-Seq library as well as to a combination of libraries from the same condition, in model and non-model organisms. We show that our method yields a more accurate prediction of expression state than existing methods, illustrated by consistent expression calls for biological replicates in the same tissue.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genome Research
409 papers in training set
Top 0.1%
18.4%
2
Nucleic Acids Research
1128 papers in training set
Top 0.9%
14.5%
3
Genome Biology
555 papers in training set
Top 0.8%
7.1%
4
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.2%
6.7%
5
Bioinformatics
1061 papers in training set
Top 4%
6.3%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 32%
4.8%
7
Nature Biotechnology
147 papers in training set
Top 2%
3.9%
8
Cell Reports Methods
141 papers in training set
Top 0.7%
3.8%
9
Nature Communications
4913 papers in training set
Top 40%
3.6%
10
Nature Methods
336 papers in training set
Top 3%
3.0%
11
iScience
1063 papers in training set
Top 8%
2.4%
12
BMC Genomics
328 papers in training set
Top 2%
2.1%
13
PLOS Computational Biology
1633 papers in training set
Top 15%
1.9%
14
Scientific Reports
3102 papers in training set
Top 55%
1.8%
15
PLOS Genetics
756 papers in training set
Top 9%
1.7%
16
BMC Bioinformatics
383 papers in training set
Top 5%
1.6%
17
Frontiers in Genetics
197 papers in training set
Top 7%
1.1%
18
Genetics
225 papers in training set
Top 4%
0.9%
19
eLife
5422 papers in training set
Top 54%
0.9%
20
Nature Genetics
240 papers in training set
Top 7%
0.8%
21
Cell Reports
1338 papers in training set
Top 34%
0.7%
22
Genome Medicine
154 papers in training set
Top 9%
0.6%
23
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.6%