Back

KIR*BLOOM: Accurate KIR genotyping using a new copy number-aware integrated genotype likelihood framework

Gohar, Y.; Garcia, A. D.; Kichula, K. M.; Norman, P. J.; Dilthey, A. T.

2026-04-20 bioinformatics
10.64898/2026.04.15.718735 bioRxiv
Show abstract

Killer-cell immunoglobulin-like receptor (KIR) genes, key modulators of natural killer (NK) cell activity, play critical roles in immune response and disease susceptibility. Accurate KIR genotyping from short-read sequencing data remains challenging because of high sequence similarity among genes, extensive copy number variation, and substantial allelic diversity. Here, we present KIR*BLOOM, a likelihood-based approach for KIR genotyping from short-read data that models read depth and sequencing error across alternative genotype configurations. KIR*BLOOM first identifies KIR-relevant read pairs, maps them to a KIR allele database, and reduces the candidate allele space by excluding alleles unlikely to be present. It then infers gene copy number and selects alleles under the inferred copy-number constraints. Finally, variant calling is used to refine CDS sequences and identify potential novel alleles. We evaluated performance on 45 whole-genome sequencing samples with haplotype-resolved assemblies from the HPRC or HGSVC, using Immuannot-derived annotations as ground truth. KIR*BLOOM achieved 99.85% precision, 99.92% recall, and a Jaccard index of 99.77% for copy-number inference. At five-digit allele resolution, it achieved 92.73% precision, 92.69% recall, and an 87.29% Jaccard index, outperforming T1K, GraphKIR, and Geny. Together, these results demonstrate that KIR*BLOOM enables highly accurate KIR genotyping from short-read sequencing data.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.2%
16.8%
2
Bioinformatics
1061 papers in training set
Top 3%
9.7%
3
Nature Communications
4913 papers in training set
Top 26%
6.9%
4
Nature Biotechnology
147 papers in training set
Top 2%
6.1%
5
Genome Biology
555 papers in training set
Top 1%
6.1%
6
Genome Research
409 papers in training set
Top 0.5%
6.1%
50% of probability mass above
7
Nucleic Acids Research
1128 papers in training set
Top 4%
4.7%
8
Bioinformatics Advances
184 papers in training set
Top 2%
3.4%
9
Cell Systems
167 papers in training set
Top 4%
3.4%
10
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.1%
11
Nature Methods
336 papers in training set
Top 3%
3.0%
12
PLOS Computational Biology
1633 papers in training set
Top 13%
2.4%
13
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.0%
14
Nature
575 papers in training set
Top 10%
2.0%
15
BMC Bioinformatics
383 papers in training set
Top 5%
1.4%
16
PLOS ONE
4510 papers in training set
Top 57%
1.4%
17
Communications Biology
886 papers in training set
Top 13%
1.3%
18
Scientific Reports
3102 papers in training set
Top 65%
1.3%
19
Nature Machine Intelligence
61 papers in training set
Top 3%
1.2%
20
Cell Genomics
162 papers in training set
Top 5%
1.2%
21
Cell Reports Methods
141 papers in training set
Top 4%
1.2%
22
Nature Computational Science
50 papers in training set
Top 1%
1.2%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
24
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
25
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
26
Nature Genetics
240 papers in training set
Top 6%
0.9%
27
iScience
1063 papers in training set
Top 31%
0.8%
28
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
29
Science
429 papers in training set
Top 20%
0.7%
30
Advanced Science
249 papers in training set
Top 22%
0.6%