Back

Manipulating base quality scores enables variant calling from bisulfite sequencing alignments using conventional Bayesian approaches

Nunn, A.; Otto, C.; Stadler, P. F.; Langenberger, D.

2021-01-11 bioinformatics
10.1101/2021.01.11.425926 bioRxiv
Show abstract

Calling germline SNP variants from bisulfite-converted sequencing data poses a challenge for conventional software, which have no inherent capability to dissociate true polymorphisms from artificial mutations induced by the chemical treatment. Nevertheless, SNP data is desirable both for genotyping and to understand the DNA methylome in the context of the genetic background. The confounding effect of bisulfite conversion can be resolved by observing differences in allele counts on a per-strand basis, whereby artificial mutations are reflected by non-complementary base pairs. Herein, we present a computational pre-processing approach for adapting sequence alignment data, thus indirectly enabling downstream analysis in this manner using conventional variant calling software such as GATK or Freebayes. In comparison to specialised tools, the method represents a marked improvement in precision-sensitivity based on high-quality, published benchmark datasets for both human and model plant variants.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.1%
18.9%
2
Bioinformatics
1061 papers in training set
Top 3%
7.2%
3
Scientific Reports
3102 papers in training set
Top 17%
6.4%
4
Nature Communications
4913 papers in training set
Top 28%
6.4%
5
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
6
Genome Biology
555 papers in training set
Top 2%
4.3%
7
Frontiers in Plant Science
240 papers in training set
Top 2%
4.0%
50% of probability mass above
8
Molecular Ecology Resources
161 papers in training set
Top 0.3%
3.6%
9
Nucleic Acids Research
1128 papers in training set
Top 7%
3.3%
10
Genetics
225 papers in training set
Top 1%
3.1%
11
PLOS ONE
4510 papers in training set
Top 45%
2.6%
12
Communications Biology
886 papers in training set
Top 4%
2.4%
13
The Plant Journal
197 papers in training set
Top 2%
1.9%
14
eLife
5422 papers in training set
Top 39%
1.8%
15
New Phytologist
309 papers in training set
Top 3%
1.7%
16
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
19
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.3%
20
Genome Medicine
154 papers in training set
Top 5%
1.3%
21
ACS Synthetic Biology
256 papers in training set
Top 2%
1.0%
22
BMC Genomics
328 papers in training set
Top 4%
1.0%
23
Genome Research
409 papers in training set
Top 3%
0.9%
24
GigaScience
172 papers in training set
Top 3%
0.8%
25
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.6%
0.8%
26
Plant Physiology
217 papers in training set
Top 3%
0.7%
27
Molecular Biology and Evolution
488 papers in training set
Top 5%
0.5%
28
Biology Methods and Protocols
53 papers in training set
Top 4%
0.5%