Back

Assessing the impact of parental linear gene normalization on the performance of statistical models for circular RNA differential expression analysis

Qorri, E.; Varga, V.; Priskin, K.; Latinovics, D.; Takacs, B.; Pekker, E.; Jaksa, G.; Csanyi, B.; Torday, L.; Bassam, A.; Kahan, Z.; Pinter, L.; Haracska, L.

2026-03-09 bioinformatics
10.64898/2026.03.06.710045 bioRxiv
Show abstract

BackgroundCircular RNAs (circRNAs) emerged as promising non-invasive cancer biomarkers due to their stability, abundance in body fluids, and regulatory potential. However, circRNA differential expression analysis (DEA) remains challenging, largely owing to lack of consensus on important preprocessing strategies such as filtering and normalization. While well-established bulk RNA-sequencing frameworks are commonly applied to circRNA data, newer approaches such as CIRI-DE (part of CIRI3 suite) integrate both linear and circular transcript information to improve detection. Despite developments, an assessment of these integrative strategies is lacking, and the critical impact of filtering on DEA model performance has not been comprehensively evaluated. ResultsIn this study, we evaluated the impact of multiple normalization and filtering strategies on circRNA DEA using five experimental datasets, including two in-house blood platelet sets and semi-parametric simulated in silico datasets. Our results emphasize the importance of selecting an appropriate filtering threshold, as overly lenient filtering substantially reduced model performance across datasets. We found edgeRs filterByExpr() strategy particularly effective in handling zero counts in circRNA data, while also generating the most reliable results across most datasets. Furthermore, by incorporating linear and circular information as described in CIRI-DE, most methods identified a higher number of differentially expressed (DE) circRNAs compared to circular counts alone. Notably, circRNAs identified by both CIRI-DE and the modified bulk RNA-sequencing pipelines showed substantial overlap. ConclusionOur findings demonstrate that automated filtering combined with linear-aware normalization significantly enhances the sensitivity and reproducibility of circRNA DEA, providing a standardized framework for more reliable biomarker discovery in transcriptomic research.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.2%
22.9%
2
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
10.3%
3
PLOS ONE
4510 papers in training set
Top 18%
10.3%
4
PeerJ
261 papers in training set
Top 0.4%
6.9%
50% of probability mass above
5
Bioinformatics
1061 papers in training set
Top 5%
4.4%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.6%
3.9%
7
BMC Genomics
328 papers in training set
Top 0.8%
3.7%
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.7%
9
Scientific Reports
3102 papers in training set
Top 41%
3.1%
10
Frontiers in Genetics
197 papers in training set
Top 3%
2.8%
11
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.1%
12
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
13
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
14
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.2%
15
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.0%
16
BioData Mining
15 papers in training set
Top 0.6%
0.9%
17
Genes
126 papers in training set
Top 2%
0.9%
18
Cancer Research Communications
46 papers in training set
Top 1.0%
0.8%
19
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
20
Clinical Chemistry
22 papers in training set
Top 0.8%
0.8%
21
GigaScience
172 papers in training set
Top 3%
0.7%
22
PROTEOMICS
35 papers in training set
Top 0.8%
0.7%
23
Journal of Translational Medicine
46 papers in training set
Top 3%
0.7%
24
RNA Biology
70 papers in training set
Top 0.6%
0.7%
25
Computational Biology and Chemistry
23 papers in training set
Top 0.8%
0.5%
26
Genomics
60 papers in training set
Top 4%
0.5%