Breaking the sparsity barrier in clinical targeted-panel sequencing: Mapping the inherited determinants of mutational signatures
Ravid, A.; Ladany, H.; Gusev, A.; Maruvka, Y. E.
Show abstract
Cancer development is shaped by somatic mutational processes that leave characteristic patterns known as mutational signatures. The inherited determinants of variability in signature activity remain largely unknown. Common germline variants that regulate this activity, which we term Signature Quantitative Trait Loci (SigQTLs), are expected to have modest individual effects, requiring cohorts of tens of thousands of samples for reliable detection. Clinical targeted-panel sequencing datasets achieve this scale, but present a fundamental challenge: individual tumors typically harbor too few mutations for stable signature inference. To overcome this sparsity barrier, we introduce GroupSig, a framework that aggregates sparse mutational patterns across samples sharing a germline genotype into information-rich meta-samples, enabling robust signature inference at the population level. We validated GroupSig by recovering the well-established correlations between age and clock-like signatures SBS1 and SBS5 using emulated panel data from The-Cancer-Genome-Atlas. We then applied GroupSig to approximately 32,000 tumor samples from the Dana-Farber Cancer Institute PROFILE cohort in a genome-wide SigQTL scan. We identified 9 genome-wide significant SigQTLs, with the strongest signal at locus 16q24.3, where six variants were associated with increased SBS7 (UV exposure) activity. This association persisted after excluding melanoma samples, arguing against a tumor-type enrichment artifact. Validation in TCGA confirmed 6 SigQTLs, all at 16q24.3, where implicated variants are eQTLs for CDK10 and SPG7 in skin tissue. Beyond genome-wide hits, DNA repair genes were 12.6-fold enriched among sub-threshold signals, supporting a polygenic architecture for mutational process regulation. GroupSig provides a scalable framework for germline-somatic association studies using panel sequencing data.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.