Back

The Impact of Non-coding G-quadruplex Variants on Human Traits and Disease Susceptibility

Sharma, R.; Hu, F.; Li, X.; Campos, R.; Kundu, K.; Atanur, S.; Karpinski, M.; Wasilewski, S.; MacArthur, S.; Vitsios, D.; Dhindsa, R. S.; Georgakopoulos-Soares, I.; Burren, O. S.; Petrovski, S.; Mustoe, A. M.; Wang, Q.; Glodzik, D.; Zou, X. Z.

2026-06-01 genetic and genomic medicine
10.64898/2026.05.29.26354456 medRxiv
Show abstract

Non-coding variants are important contributors to human traits and diseases but linking them to molecular mechanisms and phenotypes at scale remains challenging. G-quadruplexes (G4s) are four-stranded structures formed by guanine-rich sequences and have emerged as key functional elements within the non-coding genome. G4s are enriched in regulatory regions and can modulate gene expression at both the DNA and RNA levels, influencing transcription, replication, and RNA processing, positioning them as key mediators linking non-coding variation to complex biological traits. Here, we profile putative G4s across five regulatory regions in 459,449 UK Biobank genomes and perform phenome-wide association analyses spanning 2,941 plasma protein abundances, 13,321 binary traits, and 1,682 quantitative traits. We show that putative G4-modifying variants are depleted under purifying selection despite elevated local mutability and drive large, bidirectional associations with plasma proteins and clinical traits, including associations not captured by coding variants. Using a mechanism-aware collapsing strategy that groups rare non-coding variants by their predicted impact on G4 stability, we achieved stronger gene-level signals than those obtained with standard rare-variant collapsing approaches. Integrating non-coding and protein-truncating variants (PTVs) increases discovery power, revealing 843 significant associations missed by the PTV-only model. Replication in the Alliance for Genomic Discovery cohort demonstrates cross-cohort robustness. Our study suggests G4s as widespread mediators of non-coding regulation and provides a framework for mechanism-informed target discovery and prioritization across the non-coding genome.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature
575 papers in training set
Top 0.6%
26.2%
2
Nature Genetics
240 papers in training set
Top 0.6%
10.6%
3
Science
429 papers in training set
Top 3%
8.5%
4
Nature Communications
4913 papers in training set
Top 25%
7.3%
50% of probability mass above
5
Nature Structural & Molecular Biology
218 papers in training set
Top 0.9%
4.9%
6
Cell Genomics
162 papers in training set
Top 0.9%
4.4%
7
Nature Neuroscience
216 papers in training set
Top 2%
4.4%
8
Genome Biology
555 papers in training set
Top 2%
4.0%
9
Cell
370 papers in training set
Top 6%
3.6%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 26%
2.4%
11
Developmental Cell
168 papers in training set
Top 7%
2.1%
12
Nature Medicine
117 papers in training set
Top 2%
1.8%
13
Neuron
282 papers in training set
Top 6%
1.7%
14
Nature Microbiology
133 papers in training set
Top 2%
1.7%
15
Science Advances
1098 papers in training set
Top 23%
1.2%
16
The American Journal of Human Genetics
206 papers in training set
Top 3%
1.2%
17
Cell Systems
167 papers in training set
Top 9%
1.2%
18
Cancer Discovery
61 papers in training set
Top 1%
1.2%
19
Genome Medicine
154 papers in training set
Top 7%
0.9%
20
Nature Metabolism
56 papers in training set
Top 2%
0.9%
21
Nature Human Behaviour
85 papers in training set
Top 4%
0.9%
22
eLife
5422 papers in training set
Top 55%
0.8%
23
Cell Metabolism
49 papers in training set
Top 2%
0.8%
24
Journal of Clinical Investigation
164 papers in training set
Top 8%
0.7%
25
Nature Immunology
71 papers in training set
Top 2%
0.7%
26
Nucleic Acids Research
1128 papers in training set
Top 20%
0.7%
27
Molecular Cell
308 papers in training set
Top 12%
0.5%