Back

Classical HLA Allele and Haplotype Frequency Estimates in US Populations

Gragert, L.; Madbouly, A.; Bashyal, P.; Wadsworth, K.; Kempenich, J.; Bolon, Y.-T.; Maiers, M.

2026-04-13 genetics
10.64898/2026.04.09.717537 bioRxiv
Show abstract

The human leukocyte antigen (HLA) system is the primary determinant of donor selection in allogeneic hematopoietic cell transplantation (HCT) and plays a central role in solid organ transplantation, immune-mediated disease studies, evolutionary population genetics, and immunotherapy. Large-scale sampling of registry participants reflecting major US ancestry groups allows for characterization of the complex landscape of HLA haplotype diversity for the classical HLA class I (HLA-A, HLA-B, HLA-C) and HLA class II (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1) genes. Here we present nine-locus classical HLA allele and haplotype frequency estimates for five broad (Black, White, Asian or Pacific Islander, Hispanic and Native American) and 21 detailed US populations based on 9,671,082 donors with targeted genotyping by DNA-based methods. Frequency estimation used an expectation-maximization (EM) framework specifically adapted to handle mixed-resolution and ambiguous HLA genotyping data. Advancements in next-generation sequencing provide extensive HLA genotyping, offering new insights into the haplotype structure and diversity of the human MHC complex, expanding knowledge especially for HLA class II haplotypes. Population analyses reveal that the most common high-resolution haplotypes are predominantly population-specific, with only three haplotypes shared across the top-100 lists of all five broad population groups, and that Black populations exhibit the greatest nine-locus haplotypic diversity, a pattern that persists after controlling for differences in registry sample size. These frequencies, derived from the largest US cohort to date, support clinical decision-making and research in histocompatibility, immunogenetics, and transplantation and are publicly available at https://zenodo.org/records/17966993.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.2%
18.2%
2
Genome Medicine
154 papers in training set
Top 0.5%
9.9%
3
Cell Genomics
162 papers in training set
Top 0.2%
8.9%
4
Science Translational Medicine
111 papers in training set
Top 0.3%
6.2%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.8%
6.2%
6
Nature Communications
4913 papers in training set
Top 34%
4.7%
50% of probability mass above
7
Blood
67 papers in training set
Top 0.5%
3.5%
8
Cell Reports
1338 papers in training set
Top 22%
1.8%
9
Cell
370 papers in training set
Top 11%
1.7%
10
Science
429 papers in training set
Top 14%
1.7%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.7%
12
eLife
5422 papers in training set
Top 43%
1.7%
13
Nature
575 papers in training set
Top 11%
1.7%
14
Genome Biology
555 papers in training set
Top 5%
1.6%
15
Nature Medicine
117 papers in training set
Top 3%
1.5%
16
Diabetologia
36 papers in training set
Top 0.7%
1.3%
17
Science Advances
1098 papers in training set
Top 24%
1.2%
18
JCI Insight
241 papers in training set
Top 5%
1.2%
19
Blood Advances
54 papers in training set
Top 1.0%
1.1%
20
PLOS ONE
4510 papers in training set
Top 65%
0.9%
21
Nucleic Acids Research
1128 papers in training set
Top 16%
0.9%
22
PLOS Genetics
756 papers in training set
Top 14%
0.8%
23
Human Molecular Genetics
130 papers in training set
Top 4%
0.7%
24
Communications Biology
886 papers in training set
Top 25%
0.7%
25
Stem Cell Reports
118 papers in training set
Top 1%
0.7%
26
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
27
Molecular Therapy
71 papers in training set
Top 3%
0.7%
28
Human Genetics and Genomics Advances
70 papers in training set
Top 0.8%
0.7%
29
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
30
Scientific Reports
3102 papers in training set
Top 77%
0.7%