The causes of signed linkage disequilibrium within genomic datasets
Stetsenko, R.; Merot, C.; Glemin, S.; Roze, D.
Show abstract
Several recent studies have quantified signed linkage disequilibrium (LD) among mutations in genomic datasets, often reporting positive LD, particularly among mutations presumed to be less deleterious, such as synonymous variants. In this article, we investigate two potential sources of this positive LD: the focus on rare alleles, as adopted in several previous studies, and errors arising in the mapping of short-read sequences onto a reference genome. Using coalescent simulations, we extend previous theoretical results of the effect of focusing on rare alleles, and show that derived alleles present at similar frequencies tend to be in positive LD. Reanalyzing datasets from Capsella grandiflora and Drosophila melanogaster, we show that LD among synonymous derived alleles vanishes in the absence of any conditioning on frequency, while LD between mutations categorized as potentially deleterious by the SIFT4G program stays positive. However, we show that in both cases, this positive LD may be at least partly caused by the potential mismapping of a small fraction of sequences in some individuals, which could be a consequence of structural variants that are absent from the reference genome. Overall, these results show that average signed LD among mutations can be strongly affected by technical artifacts even if these concern only a minority of variants. Finally, we discuss other possible sources of positive LD among deleterious mutations.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.