Back

Relational biological structure improves fine-mapping of causal GWAS variants under weak signal

Estaji, E.; Zhao, S.-W.; Chen, Z.-Y.; Nie, S.; Mao, J.-F.

2026-05-16 genomics
10.64898/2026.05.15.725513 bioRxiv
Show abstract

Linkage disequilibrium (LD) makes causal GWAS variants indistinguishable from correlated neighbours; resolving them is the fine-mapping problem, and the challenge is species-specific: humans face dense ancestry-imbalanced LD, yeast and Arabidopsis exceptionally long LD, and crop germplasm sparse and fragmented annotations that defeat human-biobank curation pipelines. Bayesian fine-mappers integrate annotations as flat per-variant priors, discarding the relational structure linking variants to tissue-specific eQTLs, pathways and protein-protein interactions. Hierarchical belief propagation (HBP) on a variant- gene-pathway factor graph matches Bayesian baselines at 5-40x speed; an annotation-adaptive complement, graph-augmented fine-mapping (GAFM), wins 27-2 against SuSiE at weak signal and recovers LDLR, APOE, LPL, GCKR and ANGPTL3 at single-variant resolution across four Pan-UK Biobank ancestries. On the 3,000 Rice Genomes grain weight + shape panel, mixture-prior posterior reweightings of GAFM/HBP and their ensemble (GAFM-MX, HBP-MX, ENS) reach 47.6% top-1-PIP exact-position recovery of 21 panel-matched stable QTNs -- the highest of any method, exceeding SuSiE (28.6%) and SBayesRC (14.3%) --at 200-700x SuSiEs per-locus speed. Across 692 leads in four species, a non-uniform per-variant prior, not uniform high coverage, lets the graph break LD ties: adding a regulatory-element flag to an otherwise uniform human cache flips HBP narrower than GAFM from 0% to 88% on 321 Pan-UKB leads. These results recast multi-omics fine-mapping as a non-uniform-prior-curation problem rather than a uniform-coverage problem, and reframe post-GWAS analysis as message passing over biological structure rather than weighted regression on flattened annotations.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.2%
21.6%
2
Nature
575 papers in training set
Top 1%
17.9%
3
Science
429 papers in training set
Top 3%
9.7%
4
Nature Methods
336 papers in training set
Top 1%
9.7%
50% of probability mass above
5
Nature Biotechnology
147 papers in training set
Top 2%
6.1%
6
Genome Biology
555 papers in training set
Top 1%
6.1%
7
Cell Genomics
162 papers in training set
Top 1%
4.1%
8
Nature Communications
4913 papers in training set
Top 42%
3.4%
9
Nature Neuroscience
216 papers in training set
Top 3%
3.4%
10
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.6%
11
Cell
370 papers in training set
Top 13%
1.4%
12
Nature Medicine
117 papers in training set
Top 3%
1.3%
13
Cell Systems
167 papers in training set
Top 10%
1.2%
14
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
15
Nucleic Acids Research
1128 papers in training set
Top 16%
0.9%
16
Genome Medicine
154 papers in training set
Top 8%
0.8%
17
Molecular Cell
308 papers in training set
Top 11%
0.7%
18
Nature Plants
84 papers in training set
Top 2%
0.7%
19
Genome Research
409 papers in training set
Top 5%
0.7%
20
Nature Microbiology
133 papers in training set
Top 5%
0.7%
21
Nature Computational Science
50 papers in training set
Top 2%
0.7%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.6%
23
Nature Human Behaviour
85 papers in training set
Top 5%
0.6%