Back

Identification of 5mC within heterogenous tissue using de-novo somatic mutations

Wilcox, J. J. S.; Foucault, Q. J.; Gossmann, T. I.

2024-06-06 genomics
10.1101/2024.06.05.597613 bioRxiv
Show abstract

Tissues represent a fundamental evolutionary interface at the junction of genotype and phenotype. Indeed, gene regulation often occurs at the tissue level and manifests itself through tissue-specific epigenetic modifications. Studies investigating tissue epigenetics are limited by access to pure tissues. Tissues not only differ epigenetically, they are also subject to genetic differentiation through somatic mutations. As somatic mutations follow predictable patterns of inheritance, the application of population genomic approaches to inter- and intra-tissue variation could allow for the efficient detection of epigenetic modifications, even when tissue samples are convoluted. Here, we present an approach that uses de-novo somatic mutations to deconvolute 5mC methylation patterns through analysis of shifts in tissue-specific allele frequencies. We use simulations and bisulfite sequencing data to show that somatic mutations are common and detectable in next-generation sequencing data. We then use changes in mutation frequencies to accurately derive the proportional tissue of origin along a gradient of in silico subsamples of mixed-tissue bisulfite reads. We confirm that mixed tissues bias estimates of methylation levels and prevent detection of methylation differences at high levels of mixture. Our derived estimates of tissue contamination allow for unbiased and accurate deconvolution of mixed-tissue methylations in CpG and non-CpG context. We are ultimately able to recover 15-30% of differentially-methylated sites, and approximately 40-90% of differentially-methylated CpG islands and gene bodies in any cytosine context at contamination levels up to 90%. Our findings highlight the utility of population genomic approaches across scales, and expand the accessibility of epigenetics studies within evolutionary biology.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
22.3%
2
Genome Research
409 papers in training set
Top 0.2%
8.3%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.7%
4
Cell Genomics
162 papers in training set
Top 0.8%
4.8%
5
Nature Communications
4913 papers in training set
Top 33%
4.8%
6
Nature Biotechnology
147 papers in training set
Top 2%
3.9%
50% of probability mass above
7
PLOS Genetics
756 papers in training set
Top 5%
3.5%
8
Science
429 papers in training set
Top 10%
2.8%
9
Genetics
225 papers in training set
Top 2%
2.7%
10
Molecular Biology and Evolution
488 papers in training set
Top 2%
2.3%
11
Nature Genetics
240 papers in training set
Top 3%
2.3%
12
eLife
5422 papers in training set
Top 34%
2.3%
13
Genome Medicine
154 papers in training set
Top 4%
2.1%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.9%
15
Cell Reports
1338 papers in training set
Top 23%
1.8%
16
Nature Methods
336 papers in training set
Top 4%
1.8%
17
GENETICS
189 papers in training set
Top 0.6%
1.8%
18
Bioinformatics
1061 papers in training set
Top 7%
1.8%
19
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
20
Cell Reports Methods
141 papers in training set
Top 3%
1.5%
21
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
22
Scientific Reports
3102 papers in training set
Top 71%
0.9%
23
Cell Systems
167 papers in training set
Top 11%
0.9%
24
Cell
370 papers in training set
Top 15%
0.9%
25
Molecular Ecology Resources
161 papers in training set
Top 1%
0.7%
26
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.7%
27
PLOS ONE
4510 papers in training set
Top 68%
0.7%
28
Nature Ecology & Evolution
113 papers in training set
Top 4%
0.7%
29
Science Advances
1098 papers in training set
Top 32%
0.7%
30
BMC Genomics
328 papers in training set
Top 7%
0.6%