Back

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Li, Q.; Xu, L.; Wang, J.; Li, C.; Wen, W.; Shu, X.; Yang, Y.; Shu, X.-o.; Cai, Q.; Long, J.; Singh, B.; Lau, K. S.; Yin, Z.; Casey, G.; Song, M.; Peters, U.; Zheng, W.; Guo, X.

2026-06-12 genetic and genomic medicine
10.64898/2026.06.11.26355460 medRxiv
Show abstract

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.3%
14.1%
2
Nature Communications
4913 papers in training set
Top 12%
14.1%
3
Cell Genomics
162 papers in training set
Top 0.4%
7.1%
4
Nature Genetics
240 papers in training set
Top 1%
6.2%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.8%
6.2%
6
Cell Systems
167 papers in training set
Top 3%
4.8%
50% of probability mass above
7
Cell
370 papers in training set
Top 4%
4.8%
8
Med
38 papers in training set
Top 0.1%
3.0%
9
Cancer Discovery
61 papers in training set
Top 0.9%
2.4%
10
Nature Biomedical Engineering
42 papers in training set
Top 0.5%
2.4%
11
Genome Biology
555 papers in training set
Top 4%
2.0%
12
Cancer Research
116 papers in training set
Top 2%
1.8%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.8%
14
Nature
575 papers in training set
Top 11%
1.7%
15
Journal of Clinical Investigation
164 papers in training set
Top 3%
1.7%
16
Developmental Cell
168 papers in training set
Top 8%
1.7%
17
Cell Reports Medicine
140 papers in training set
Top 5%
1.5%
18
Science
429 papers in training set
Top 16%
1.3%
19
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
20
eLife
5422 papers in training set
Top 48%
1.3%
21
npj Precision Oncology
48 papers in training set
Top 0.8%
1.3%
22
Science Advances
1098 papers in training set
Top 22%
1.3%
23
Scientific Reports
3102 papers in training set
Top 71%
0.9%
24
Nature Medicine
117 papers in training set
Top 5%
0.8%
25
JCI Insight
241 papers in training set
Top 8%
0.7%
26
iScience
1063 papers in training set
Top 35%
0.7%
27
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
28
Communications Biology
886 papers in training set
Top 27%
0.7%