Back

MethylScore, a pipeline for accurate and context-aware identification of differentially methylated regions from population-scale plant WGBS data

Hüther, P.; Hagmann, J.; Nunn, A.; Kakoulidou, I.; Pisupati, R.; Langenberger, D.; Weigel, D.; Johannes, F.; Schultheiss, S. J.; Becker, C.

2022-01-06 plant biology
10.1101/2022.01.06.475031 bioRxiv
Show abstract

Whole-genome bisulfite sequencing (WGBS) is the standard method for profiling DNA methylation at single-nucleotide resolution. Many WGBS-based studies aim to identify biologically relevant loci that display differential methylation between genotypes, treatment groups, tissues, or developmental stages. Over the years, different tools have been developed to extract differentially methylated regions (DMRs) from whole-genome data. Often, such tools are built upon assumptions from mammalian data and do not consider the substantially more complex and variable nature of plant DNA methylation. Here, we present MethylScore, a pipeline to analyze WGBS data and to account for plant-specific DNA methylation properties. MethylScore processes data from genomic alignments to DMR output and is designed to be usable by novice and expert users alike. It uses an unsupervised machine learning approach to segment the genome by classification into states of high and low methylation, substantially reducing the number of necessary statistical tests while increasing the signal-to-noise ratio and the statistical power. We show how MethylScore can identify DMRs from hundreds of samples and how its data-driven approach can stratify associated samples without prior information. We identify DMRs in the A. thaliana 1001 Genomes dataset to unveil known and unknown genotype-epigenotype associations. MethylScore is an accessible pipeline for plant WGBS data, with unprecedented features for DMR calling in small- and large-scale datasets; it is built as a Nextflow pipeline and its source code is available at https://github.com/Computomics/MethylScore.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
34.9%
2
Nature Biotechnology
147 papers in training set
Top 0.6%
10.3%
3
Genome Research
409 papers in training set
Top 0.2%
8.5%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 25%
7.3%
5
Nature Methods
336 papers in training set
Top 2%
4.9%
6
Nature
575 papers in training set
Top 10%
1.8%
7
Plant Communications
35 papers in training set
Top 0.7%
1.8%
8
Nature Plants
84 papers in training set
Top 0.9%
1.8%
9
Cell Systems
167 papers in training set
Top 7%
1.7%
10
GigaScience
172 papers in training set
Top 2%
1.4%
11
Nucleic Acids Research
1128 papers in training set
Top 13%
1.4%
12
The Plant Journal
197 papers in training set
Top 3%
1.2%
13
New Phytologist
309 papers in training set
Top 4%
1.2%
14
Molecular Plant
36 papers in training set
Top 1%
1.0%
15
Genome Medicine
154 papers in training set
Top 7%
0.9%
16
BMC Genomics
328 papers in training set
Top 5%
0.8%
17
PLOS ONE
4510 papers in training set
Top 65%
0.8%
18
Bioinformatics
1061 papers in training set
Top 9%
0.8%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
21
eLife
5422 papers in training set
Top 59%
0.7%
22
Scientific Reports
3102 papers in training set
Top 78%
0.7%
23
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
24
Genetics
225 papers in training set
Top 5%
0.7%
25
Science Advances
1098 papers in training set
Top 32%
0.7%
26
Nature Computational Science
50 papers in training set
Top 2%
0.7%
27
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
28
Plant Physiology
217 papers in training set
Top 3%
0.7%
29
PLOS Genetics
756 papers in training set
Top 18%
0.5%
30
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.5%