Calibration improves estimation of linkage disequilibrium on low sample sizes
Bercovich Szulmajster, U.; Wiuf, C.; Albrechtsen, A.
Show abstract
Linkage disequilibrium is a central statistic in population genetic studies, commonly measured by the squared correlation between pairs of genetic variants. An important drawback of this measure is its upward bias caused by a finite sample size. To handle this, different methods exist that correct for sample-size bias. However, because the correlation consists of a ratio, there is no unbiased method to compute it. In this work, we present a procedure to calibrate those methods using a non-parametric approach with simulated data. This is done with forward modeling to generate genotype matrices with known parameters, followed by an inverse mapping to recover estimates of the underlying parameters. Then, a mean-centering calibration is applied to the recovered estimate of the true parameter. This approach is applied to real and simulated data, showing consistent improvement in accuracy compared to other sample-size-aware methods. Furthermore, to study the effects on downstream analyses, we analyze the classification performance on LD pruning, where we also observe an improvement, particularly in extreme cases with low sample sizes of 5 or 10 individuals.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.