Back

Federated cross-biobank conditional analysis identifies LDL-C lowering effects of DNAJC13 haploinsufficiency and LDLR regulation

Wright, H. I. W.; Darrous, L.; Ferrat, L.; Chundru, V. K.; Kamoun, A.; Wood, A. R.; Wright, C. F.; Patel, K. A.; Frayling, T. M.; Weedon, M. N.; Beaumont, R. N.; Hawkes, G.

2026-02-05 genetics
10.64898/2026.02.03.702791 bioRxiv
Show abstract

Whole genome sequencing in diverse population-scale biobanks offers new insights into the genetic architecture of complex traits from rare and non-coding variants. However, rare single variant and aggregate associations are often confounded by linkage disequilibrium and haplotype structure, resulting in large numbers of false-positive associations. Previous methods that rely on reference panels or linkage disequilibrium-matrices to determine conditional independence in meta-analyses do not scale to very rare variants, which may be observed in only one biobank and can exhibit long-range haplotypes. Here, we implement a federated approach to perform iterative conditional meta-analysis on individual-level genotype and phenotype data across biobanks while adhering to data sharing policies. We applied our methodology to a meta-analysis of LDL-C in 614,375 individuals from UK Biobank and All of Us, encompassing six genetic ancestry groups. After conditioning, only 4.3% of significantly associated rare single variants and 6.9% of aggregates remained statistically independent. The proportion of significant aggregates that remained independent after conditioning was higher for coding-based tests than non-coding. We further validate that our approach effectively suppresses false-positive associations using simulations centred on the LDLR locus. We identify allelic series of variants associated with reduced LDL-C, including loss-of-function variants in DNAJC13 and variants in the 3-prime untranslated region of LDLR. Our results highlight that federated conditioning can distinguish independent rare variant signals from linkage and haplotype structure artifacts in multi-ancestry meta-analyses across separate biobanks.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.1%
23.0%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
23.0%
3
Nature Communications
4913 papers in training set
Top 22%
8.6%
50% of probability mass above
4
Genome Biology
555 papers in training set
Top 0.6%
8.4%
5
Cell Genomics
162 papers in training set
Top 0.5%
6.5%
6
Genome Medicine
154 papers in training set
Top 3%
3.1%
7
Genome Research
409 papers in training set
Top 1%
2.7%
8
PLOS Genetics
756 papers in training set
Top 7%
2.1%
9
Nature
575 papers in training set
Top 12%
1.5%
10
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
11
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
12
International Journal of Epidemiology
74 papers in training set
Top 2%
1.1%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.0%
14
Genetic Epidemiology
46 papers in training set
Top 0.7%
0.9%
15
Human Genetics and Genomics Advances
70 papers in training set
Top 0.7%
0.8%
16
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
17
eLife
5422 papers in training set
Top 55%
0.8%
18
Bioinformatics
1061 papers in training set
Top 9%
0.8%
19
Scientific Reports
3102 papers in training set
Top 75%
0.7%
20
Communications Biology
886 papers in training set
Top 28%
0.7%
21
Science Translational Medicine
111 papers in training set
Top 7%
0.7%
22
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
23
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%
24
Nature Human Behaviour
85 papers in training set
Top 5%
0.5%
25
PLOS ONE
4510 papers in training set
Top 73%
0.5%
26
Science
429 papers in training set
Top 22%
0.5%