Federated cross-biobank conditional analysis identifies LDL-C lowering effects of DNAJC13 haploinsufficiency and LDLR regulation
Wright, H. I. W.; Darrous, L.; Ferrat, L.; Chundru, V. K.; Kamoun, A.; Wood, A. R.; Wright, C. F.; Patel, K. A.; Frayling, T. M.; Weedon, M. N.; Beaumont, R. N.; Hawkes, G.
Show abstract
Whole genome sequencing in diverse population-scale biobanks offers new insights into the genetic architecture of complex traits from rare and non-coding variants. However, rare single variant and aggregate associations are often confounded by linkage disequilibrium and haplotype structure, resulting in large numbers of false-positive associations. Previous methods that rely on reference panels or linkage disequilibrium-matrices to determine conditional independence in meta-analyses do not scale to very rare variants, which may be observed in only one biobank and can exhibit long-range haplotypes. Here, we implement a federated approach to perform iterative conditional meta-analysis on individual-level genotype and phenotype data across biobanks while adhering to data sharing policies. We applied our methodology to a meta-analysis of LDL-C in 614,375 individuals from UK Biobank and All of Us, encompassing six genetic ancestry groups. After conditioning, only 4.3% of significantly associated rare single variants and 6.9% of aggregates remained statistically independent. The proportion of significant aggregates that remained independent after conditioning was higher for coding-based tests than non-coding. We further validate that our approach effectively suppresses false-positive associations using simulations centred on the LDLR locus. We identify allelic series of variants associated with reduced LDL-C, including loss-of-function variants in DNAJC13 and variants in the 3-prime untranslated region of LDLR. Our results highlight that federated conditioning can distinguish independent rare variant signals from linkage and haplotype structure artifacts in multi-ancestry meta-analyses across separate biobanks.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.