Back

Benchmarking HLA genotyping from whole-genome sequencing across multiple sequencing technologies

Cremin, C.; Elavalli, S.; Paulin, L.; Arres Reche, J.; Saad, A. A. Y. A.; Attia, A.; Minas, C.; Aldhuhoori, F.; Katagi, G.; Wu, H.; Sidahmed, H.; Mafofo, J.; Soliman, O.; Behl, S.; Pariyachery, S.; Gupta, V.; Ghanem, D.; Sajjad, H.; Cardoso, T.; El-Khani, A.; Al Marzooqi, F.; Magalhaes, T.; Sedlazeck, F. J.; Quilez, J.

2026-02-12 health informatics
10.64898/2026.02.10.26345621 medRxiv
Show abstract

BackgroundThe hyperpolymorphic nature and structural complexity of the human leukocyte antigen (HLA) genomic region present challenges for accurate and scalable typing across diverse sample types. While wholegenome sequencing (WGS) offers the opportunity to infer HLA genotypes without targeted enrichment, systematic benchmarks across sequencing platforms, biospecimens and coverage levels remain limited. ResultsWe assembled a multi-platform resource of WGS datasets derived from short-read (Illumina, MGI) and long-read (Oxford Nanopore Technologies R9 and R10) sequencing, spanning 29 biospecimens including cell lines, blood, buccal swab and saliva. We evaluated the performance of the HLA caller HLA*LA across 13 HLA genes, using a clinically validated assay as reference. WGSbased HLA genotyping achieved [~]95% accuracy across sequencing platforms, with Class I loci exhibiting higher accuracy than Class II. Crossplatform concordance was high, and performance remained consistent across Illumina, MGI and Oxford Nanopore chemistries. Analysis of blood, buccal swab and saliva samples showed that blood and buccal swabs supported accurate HLA inference, whereas saliva yielded reduced concordance. Downsampling experiments demonstrated that 15x coverage was sufficient to retain >95% accuracy at twofield resolution, with lower depths supporting lower-resolution typing. ConclusionsOur results demonstrate that WGS provides a robust, platformagnostic framework for accurate HLA genotyping across sample types and coverage levels. These benchmarks establish practical conditions for reliable HLA inference and underscore the utility of WGS for populationscale HLA analyses and future clinical applications.

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 4%
12.5%
2
Frontiers in Digital Health
20 papers in training set
Top 0.1%
6.4%
3
PLOS ONE
4510 papers in training set
Top 27%
6.4%
4
Nature Communications
4913 papers in training set
Top 28%
6.4%
5
Bioinformatics
1061 papers in training set
Top 4%
4.9%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
7
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 2%
2.1%
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
1.9%
9
Journal of Clinical Microbiology
120 papers in training set
Top 0.9%
1.8%
10
Genome Medicine
154 papers in training set
Top 4%
1.7%
11
Clinical Infectious Diseases
231 papers in training set
Top 3%
1.7%
12
Clinical Chemistry
22 papers in training set
Top 0.4%
1.7%
50% of probability mass above
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
Journal of Infection
71 papers in training set
Top 1%
1.7%
15
Wellcome Open Research
57 papers in training set
Top 0.9%
1.7%
16
Communications Biology
886 papers in training set
Top 11%
1.5%
17
eBioMedicine
130 papers in training set
Top 2%
1.5%
18
Cell Reports Medicine
140 papers in training set
Top 5%
1.3%
19
Frontiers in Immunology
586 papers in training set
Top 5%
1.3%
20
BMC Genomics
328 papers in training set
Top 3%
1.2%
21
Science Translational Medicine
111 papers in training set
Top 4%
1.2%
22
PNAS Nexus
147 papers in training set
Top 0.6%
1.2%
23
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.1%
24
JAMA
17 papers in training set
Top 0.2%
1.0%
25
Frontiers in Neurology
91 papers in training set
Top 4%
0.9%
26
BMC Medicine
163 papers in training set
Top 6%
0.9%
27
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
28
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
29
Viruses
318 papers in training set
Top 5%
0.8%
30
Annals of Internal Medicine
27 papers in training set
Top 0.8%
0.8%