Back

The consequences of variant calling decisions in secondary analyses of cancer sequencing data

Garcia-Prieto, C.; Valencia, A.; Porta-Pardo, E.

2020-01-30 bioinformatics
10.1101/2020.01.29.924860 bioRxiv
Show abstract

The analysis of cancer genomes provides fundamental information about its aetiology, the processes driving cell transformation or potential treatments. The first crucial step in the analysis of any tumor genome is the identification of somatic genetic variants that cancer cells have acquired during their evolution. For that purpose, a wide range of somatic variant callers have been developed in recent years. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Here we present a study to elucidate whether different variant callers (MuSE, MuTect2, SomaticSniper, VarScan2) and strategies to combine them (Consensus and Union) lead to different results in these three important downstream analyses of cancer genomics data: identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants. To this end, we tested how the results of these three analyses varied depending on the somatic mutation caller in five different projects from The Cancer Genome Atlas (TCGA). Our results show that variant calling decisions have a significant impact on these downstream analyses, creating important differences in driver genes identification and mutational processes attribution among variant call sets, as well as in the detection of clinically actionable targets. More importantly, it seems that Consensus, a very widely used strategy by the research community, is not the optimal strategy, as it can lead to the loss of some cancer driver genes and actionable mutations. On the other hand, the Union seems to be a legit strategy for some downstream analyses with a robust performance overall. Contact: eduard.porta@bsc.es; alfonso.valencia@bsc.es

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.2%
22.3%
2
Bioinformatics
1061 papers in training set
Top 3%
9.1%
3
PeerJ
261 papers in training set
Top 0.3%
8.1%
4
GigaScience
172 papers in training set
Top 0.2%
6.3%
5
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
4.8%
50% of probability mass above
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.9%
4.8%
7
Scientific Reports
3102 papers in training set
Top 28%
4.3%
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
9
Biology Methods and Protocols
53 papers in training set
Top 0.4%
2.9%
10
Cancers
200 papers in training set
Top 2%
2.6%
11
Frontiers in Genetics
197 papers in training set
Top 3%
2.4%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.1%
13
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
2.1%
14
PLOS ONE
4510 papers in training set
Top 51%
1.9%
15
European Journal of Human Genetics
49 papers in training set
Top 0.6%
1.7%
16
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.6%
17
International Journal of Molecular Sciences
453 papers in training set
Top 8%
1.6%
18
BMC Genomics
328 papers in training set
Top 4%
1.2%
19
International Journal of Cancer
42 papers in training set
Top 0.9%
1.2%
20
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
21
BioData Mining
15 papers in training set
Top 0.9%
0.7%
22
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.7%
23
F1000Research
79 papers in training set
Top 5%
0.7%
24
Life
27 papers in training set
Top 0.7%
0.6%