Back

Calibration improves estimation of linkage disequilibrium on low sample sizes

Bercovich Szulmajster, U.; Wiuf, C.; Albrechtsen, A.

2026-03-07 bioinformatics
10.64898/2026.03.05.709321 bioRxiv
Show abstract

Linkage disequilibrium is a central statistic in population genetic studies, commonly measured by the squared correlation between pairs of genetic variants. An important drawback of this measure is its upward bias caused by a finite sample size. To handle this, different methods exist that correct for sample-size bias. However, because the correlation consists of a ratio, there is no unbiased method to compute it. In this work, we present a procedure to calibrate those methods using a non-parametric approach with simulated data. This is done with forward modeling to generate genotype matrices with known parameters, followed by an inverse mapping to recover estimates of the underlying parameters. Then, a mean-centering calibration is applied to the recovered estimate of the true parameter. This approach is applied to real and simulated data, showing consistent improvement in accuracy compared to other sample-size-aware methods. Furthermore, to study the effects on downstream analyses, we analyze the classification performance on LD pruning, where we also observe an improvement, particularly in extreme cases with low sample sizes of 5 or 10 individuals.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.2%
22.4%
2
Bioinformatics
1061 papers in training set
Top 1%
22.4%
3
PLOS Computational Biology
1633 papers in training set
Top 7%
4.8%
4
The American Journal of Human Genetics
206 papers in training set
Top 1%
3.9%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 36%
3.9%
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
7
PLOS Genetics
756 papers in training set
Top 5%
3.6%
8
Frontiers in Genetics
197 papers in training set
Top 2%
3.6%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.6%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
11
Scientific Reports
3102 papers in training set
Top 51%
2.1%
12
Molecular Ecology Resources
161 papers in training set
Top 0.6%
1.7%
13
European Journal of Human Genetics
49 papers in training set
Top 0.8%
1.3%
14
Genetic Epidemiology
46 papers in training set
Top 0.6%
1.2%
15
Nature Communications
4913 papers in training set
Top 58%
0.9%
16
Genetics
225 papers in training set
Top 4%
0.9%
17
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.9%
18
Statistics in Medicine
34 papers in training set
Top 0.3%
0.8%
19
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
20
Heredity
53 papers in training set
Top 0.3%
0.7%
21
Peer Community Journal
254 papers in training set
Top 4%
0.7%
22
GENETICS
189 papers in training set
Top 2%
0.7%
23
BMC Genomics
328 papers in training set
Top 7%
0.6%
24
Genome Research
409 papers in training set
Top 5%
0.6%