Back

Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures

Dong, X.; Du, M. R. M.; Gouil, Q.; Tian, L.; Jabbari, J. S.; Bowden, R.; Baldoni, P. L.; Chen, Y.; Smyth, G. K.; Amarasinghe, S. L.; Law, C. W.; Ritchie, M. E.

2023-05-18 bioinformatics
10.1101/2022.07.22.501076 bioRxiv
Show abstract

The current lack of benchmark datasets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs ("sequins"). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that, StringTie2 and bambu outperformed other tools from the 6 isoform detection tools tested, DESeq2, edgeR and limma-voom were best amongst the 5 differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the 5 tools compared, which suggests further methods development is needed for this application.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.3%
18.7%
2
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
18.7%
3
PLOS ONE
4510 papers in training set
Top 27%
6.4%
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.8%
4.9%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.9%
50% of probability mass above
6
BMC Genomics
328 papers in training set
Top 0.5%
4.3%
7
Scientific Reports
3102 papers in training set
Top 31%
4.0%
8
Genome Biology
555 papers in training set
Top 2%
4.0%
9
Bioinformatics
1061 papers in training set
Top 6%
3.3%
10
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
2.1%
11
PeerJ
261 papers in training set
Top 5%
2.1%
12
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
GigaScience
172 papers in training set
Top 1%
1.7%
15
RNA Biology
70 papers in training set
Top 0.3%
1.5%
16
Nature Communications
4913 papers in training set
Top 56%
1.2%
17
RNA
169 papers in training set
Top 0.3%
1.1%
18
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
19
Genomics
60 papers in training set
Top 2%
0.9%
20
Journal of Proteome Research
215 papers in training set
Top 2%
0.8%
21
Microbial Genomics
204 papers in training set
Top 2%
0.7%
22
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.7%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
24
iScience
1063 papers in training set
Top 32%
0.7%
25
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
26
Methods
29 papers in training set
Top 0.8%
0.6%