Back

Breaking the sparsity barrier in clinical targeted-panel sequencing: Mapping the inherited determinants of mutational signatures

Ravid, A.; Ladany, H.; Gusev, A.; Maruvka, Y. E.

2026-03-31 cancer biology
10.64898/2026.03.29.714525 bioRxiv
Show abstract

Cancer development is shaped by somatic mutational processes that leave characteristic patterns known as mutational signatures. The inherited determinants of variability in signature activity remain largely unknown. Common germline variants that regulate this activity, which we term Signature Quantitative Trait Loci (SigQTLs), are expected to have modest individual effects, requiring cohorts of tens of thousands of samples for reliable detection. Clinical targeted-panel sequencing datasets achieve this scale, but present a fundamental challenge: individual tumors typically harbor too few mutations for stable signature inference. To overcome this sparsity barrier, we introduce GroupSig, a framework that aggregates sparse mutational patterns across samples sharing a germline genotype into information-rich meta-samples, enabling robust signature inference at the population level. We validated GroupSig by recovering the well-established correlations between age and clock-like signatures SBS1 and SBS5 using emulated panel data from The-Cancer-Genome-Atlas. We then applied GroupSig to approximately 32,000 tumor samples from the Dana-Farber Cancer Institute PROFILE cohort in a genome-wide SigQTL scan. We identified 9 genome-wide significant SigQTLs, with the strongest signal at locus 16q24.3, where six variants were associated with increased SBS7 (UV exposure) activity. This association persisted after excluding melanoma samples, arguing against a tumor-type enrichment artifact. Validation in TCGA confirmed 6 SigQTLs, all at 16q24.3, where implicated variants are eQTLs for CDK10 and SPG7 in skin tissue. Beyond genome-wide hits, DNA repair genes were 12.6-fold enriched among sub-threshold signals, supporting a polygenic architecture for mutational process regulation. GroupSig provides a scalable framework for germline-somatic association studies using panel sequencing data.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature
575 papers in training set
Top 0.5%
27.8%
2
Nature Genetics
240 papers in training set
Top 0.3%
14.8%
3
Nature Communications
4913 papers in training set
Top 22%
8.5%
50% of probability mass above
4
Science
429 papers in training set
Top 4%
6.9%
5
Cell Systems
167 papers in training set
Top 3%
4.3%
6
Genome Medicine
154 papers in training set
Top 2%
4.0%
7
Nature Cancer
35 papers in training set
Top 0.4%
2.7%
8
Cancer Discovery
61 papers in training set
Top 0.8%
2.4%
9
Genome Biology
555 papers in training set
Top 3%
2.1%
10
Cell Reports
1338 papers in training set
Top 22%
1.9%
11
Cancer Research
116 papers in training set
Top 2%
1.8%
12
Nature Biotechnology
147 papers in training set
Top 4%
1.7%
13
Nature Medicine
117 papers in training set
Top 2%
1.7%
14
Nature Cell Biology
99 papers in training set
Top 3%
1.5%
15
Science Translational Medicine
111 papers in training set
Top 3%
1.3%
16
Cell Reports Medicine
140 papers in training set
Top 6%
1.0%
17
Cell Genomics
162 papers in training set
Top 5%
0.9%
18
Molecular Cell
308 papers in training set
Top 9%
0.9%
19
Science Advances
1098 papers in training set
Top 28%
0.8%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
21
Cell
370 papers in training set
Top 17%
0.8%
22
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.8%
23
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%
24
Nature Methods
336 papers in training set
Top 7%
0.6%
25
Genes & Development
90 papers in training set
Top 2%
0.6%
26
eLife
5422 papers in training set
Top 64%
0.5%
27
Developmental Cell
168 papers in training set
Top 13%
0.5%