Back

Evaluating the Contribution of Genome 3D Folding to Variation in Human Height Using Machine Learning

Gu, W.; Gilbertson, E.; Baranzini, S. E.; Salem, R.; Capra, J. A.

2025-09-15 genetics
10.1101/2025.09.09.675195 bioRxiv
Show abstract

Genome-wide association studies (GWAS) have identified thousands of variants associated with complex traits, yet the majority lie in noncoding regions, making it difficult to determine their functional impact. Alterations to the three-dimensional (3D) spatial interactions among gene regulatory elements are increasingly recognized as a mechanism by which genetic variants influence gene expression. However, experimentally evaluating whether variants disrupt 3D-genome structure is not feasible at GWAS scale. To address this, we developed a computational framework that integrates GWAS summary statistics with predictions from the Akita sequence-based deep learning model of 3D chromatin contacts. We applied the framework to 9,917 genomic regions associated with human height, assessing both individual variants and haplotypes for their predicted impact on 3D genome architecture. Only a small fraction of height-associated haplotypes had substantial predicted disruption of 3D folding (17 regions, 0.17%, exceeded a disruption score of 0.1). Considering all common variants in a haplotype together generally produced greater perturbations than individual variants, but several highly divergent regions were driven by single variants. We highlight a variant that disrupts the binding motif at a confirmed CTCF binding site and is predicted to modify 3D genome contacts with the LCOR promoter, suggesting that 3D-genome-mediated disruption of gene regulation underlies the association with height. This work presents a scalable and interpretable strategy for integrating 3D genome modeling with GWAS, enabling investigation of this important regulatory mechanism in the connection of non-coding genetic variation to complex traits.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
11.7%
2
Frontiers in Genetics
197 papers in training set
Top 0.2%
11.7%
3
PLOS Genetics
756 papers in training set
Top 1.0%
11.7%
4
The American Journal of Human Genetics
206 papers in training set
Top 0.5%
8.7%
5
Nature Communications
4913 papers in training set
Top 36%
4.1%
6
Cell Genomics
162 papers in training set
Top 2%
3.5%
50% of probability mass above
7
PLOS Computational Biology
1633 papers in training set
Top 11%
3.4%
8
iScience
1063 papers in training set
Top 6%
3.4%
9
eLife
5422 papers in training set
Top 28%
3.4%
10
Nucleic Acids Research
1128 papers in training set
Top 7%
3.4%
11
Scientific Reports
3102 papers in training set
Top 42%
2.9%
12
Bioinformatics
1061 papers in training set
Top 6%
2.9%
13
Genetics
225 papers in training set
Top 2%
2.7%
14
Communications Biology
886 papers in training set
Top 6%
2.0%
15
Human Molecular Genetics
130 papers in training set
Top 1%
2.0%
16
Genome Biology
555 papers in training set
Top 4%
1.8%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.8%
18
GENETICS
189 papers in training set
Top 0.6%
1.7%
19
European Journal of Human Genetics
49 papers in training set
Top 0.7%
1.6%
20
Genome Research
409 papers in training set
Top 3%
1.3%
21
Bioinformatics Advances
184 papers in training set
Top 4%
1.3%
22
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
23
Nature Genetics
240 papers in training set
Top 7%
0.8%
24
PLOS ONE
4510 papers in training set
Top 68%
0.7%
25
Science Advances
1098 papers in training set
Top 32%
0.7%
26
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.7%
27
Genetic Epidemiology
46 papers in training set
Top 0.9%
0.7%
28
Human Genetics
25 papers in training set
Top 0.5%
0.7%