Back

Accurate imputation of inversions in human genomes using different algorithms and data sources

Yakymenko, I.; Mompart, A.; Caceres, M.

2026-01-26 genomics
10.64898/2026.01.23.701363 bioRxiv
Show abstract

Complex genomic regions harbor different structural arrangements that can mutate quite rapidly, which makes determining their functional effects very difficult. Characterization of inversions originated by homologous mechanisms is especially challenging due to the presence of inverted repeats at the breakpoints and the fact that most of them are recurrent. Imputation can infer missing genotypes, but it has been mainly limited to simple variants and little is known about how well it works for human inversions. Here, we tested five common imputation programs to impute a set of 52 inversions experimentally genotyped in multiple samples that lacked SNPs in perfect linkage disequilibrium. Using whole genome sequencing data and simulated microarrays with variable SNP density, we found that 40.4-75.5% of inversions could be accurately imputed in three human populations by at least one program, with results depending mainly on the number of SNPs available, the genotyped samples and the recurrence of inversions. Also, genotype probability filtering was a key factor for inversion imputation accuracy. In particular, Minimac4 and IMPUTE5 showed more accurately imputed inversions and less poorly imputed individuals with respect to the other methods. This work therefore contributes to optimize inversion imputation, making possible the study of their functional impact.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.1%
22.7%
2
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.6%
9.2%
3
Nucleic Acids Research
1128 papers in training set
Top 2%
8.5%
4
Scientific Reports
3102 papers in training set
Top 23%
4.9%
5
PLOS Genetics
756 papers in training set
Top 4%
4.0%
6
Genome Biology
555 papers in training set
Top 2%
3.6%
50% of probability mass above
7
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
8
Journal of Genetics and Genomics
36 papers in training set
Top 0.5%
2.8%
9
BMC Genomics
328 papers in training set
Top 1%
2.6%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.4%
11
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.4%
12
Cell Genomics
162 papers in training set
Top 3%
1.9%
13
Nature Communications
4913 papers in training set
Top 51%
1.7%
14
PLOS ONE
4510 papers in training set
Top 54%
1.7%
15
GigaScience
172 papers in training set
Top 2%
1.5%
16
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
18
Genome Research
409 papers in training set
Top 3%
1.3%
19
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
20
Bioinformatics
1061 papers in training set
Top 8%
1.3%
21
Genes
126 papers in training set
Top 2%
1.2%
22
Human Genomics
21 papers in training set
Top 0.2%
1.2%
23
BMC Biology
248 papers in training set
Top 2%
1.1%
24
Mobile DNA
27 papers in training set
Top 0.2%
1.0%
25
Communications Biology
886 papers in training set
Top 18%
0.9%
26
European Journal of Human Genetics
49 papers in training set
Top 1%
0.9%
27
Human Genetics and Genomics Advances
70 papers in training set
Top 0.8%
0.8%
28
Human Genetics
25 papers in training set
Top 0.4%
0.7%
29
Genomics
60 papers in training set
Top 3%
0.6%
30
Human Molecular Genetics
130 papers in training set
Top 5%
0.5%