Back

The reliability and accuracy of recombination inferred by Shapeit2 duoHMM on whole genome sequence

Oubninte, S.; Ruczinski, I.; Yanek, L. R.; Mathias, R.; Bureau, A.

2026-05-10 genomics
10.64898/2026.05.06.723015 bioRxiv
Show abstract

Few studies assessed the performance of population-based phasing combined with parental genotypes to infer recombination on whole genome sequence (WGS) data. In this study, our objective was to evaluate whether Shapeit2 duoHMM, a Hidden Markov Model using parental genotypes, infers recombination events reliably on WGS and with narrower intervals than SNP arrays. We based our analysis on the overlap between recombination events inferred by Merlin on SNP genotypes and Shapeit2 on WGS and SNP genotypes. We used a sample of 61 extended families from the GeneSTAR study with TopMED freeze 8 WGS on 580 sequenced subjects (60% of sample). Shapeit2 was run with a window size of 500 kilobases and 200 states on WGS. To mimic a SNP array, we extracted genotypes of 355,112 autosomal markers on the Illumina OmniExpress array. The number of recombination events per meiosis inferred by Shapeit2 on the WGS data (36.8) was aligned with the expected numbers over autosomes (35.7), although Merlin overestimated this number (115.0). 73% of Shapeit2 recombination events on WGS were detected by Merlin, a proportion rising to 91% when restricting to events also inferred by Shapeit2 on OmniExpress genotypes. Furthermore, Shapeit2 recombination intervals were narrower on WGS than OmniExpress genotypes (median of 4,530 bp vs. 49,458 bp). This suggests that Shapeit2 on WGS is a reliable and accurate method for inferring recombination events.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
32.6%
2
PLOS Genetics
756 papers in training set
Top 0.4%
18.4%
50% of probability mass above
3
Genome Research
409 papers in training set
Top 0.4%
6.3%
4
Nature Communications
4913 papers in training set
Top 30%
6.2%
5
GENETICS
189 papers in training set
Top 0.2%
4.3%
6
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
7
Genome Biology
555 papers in training set
Top 3%
2.6%
8
Scientific Reports
3102 papers in training set
Top 51%
2.1%
9
Molecular Ecology Resources
161 papers in training set
Top 0.6%
1.8%
10
PLOS ONE
4510 papers in training set
Top 52%
1.8%
11
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
12
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.5%
1.3%
13
Genetics
225 papers in training set
Top 3%
1.2%
14
eLife
5422 papers in training set
Top 50%
1.2%
15
European Journal of Human Genetics
49 papers in training set
Top 1%
0.9%
16
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
17
BMC Genomics
328 papers in training set
Top 5%
0.9%
18
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
19
Bioinformatics
1061 papers in training set
Top 9%
0.8%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
21
Genome Medicine
154 papers in training set
Top 8%
0.7%
22
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%
23
Nature Genetics
240 papers in training set
Top 9%
0.6%
24
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%