Back

The human pangenome reference reduces ancestry-related biases in somatic mutation detection

Pham, C. V. K.; Abdelmalek, F. S. A.; Hua, T.; Apel, E.; Bizjak, A.; Schmidt, E. J.; Houlahan, K. E.

2026-04-01 bioinformatics
10.64898/2026.03.30.715289 bioRxiv
Show abstract

Commonly used human reference genomes collapse extensive genetic variability into a single linear genome of which 70% is derived from one donor. These linear genomes fail to capture the full spectrum of genetic variation, which can lead to misalignment of sequencing reads particularly for individuals underrepresented by the linear reference genomes. To address this shortcoming, the Human Pangenome Reference Consortium released the first draft of the human pangenome reference, a graph-based reference that integrates diverse haplotypes. While the human pangenome reference has shown increased accuracy in detecting inherited DNA variants, it remains to be seen if the observed improvements extend to somatic mutation detection. Here, we systematically benchmarked somatic single nucleotide variant (SNV) detection leveraging the human pangenome in 30 whole exome sequenced bladder tumours with matched blood tissue of diverse ancestries. We found somatic SNV detection leveraging the human pangenome reference outperformed the linear reference, most notably in individuals of East Asian ancestry where we observed on average a 20% improvement in detection accuracy. Improvements to detection accuracy in individuals of European ancestry were marginal. The increase in accuracy was attributed to reduced germline contamination and reduced reference bias. Further, we demonstrate the pangenome increases SNV detection precision, mitigating the need for time and computationally expensive ensemble approaches that take the consensus across multiple tools. Finally, we demonstrate that the increased precision when aligned to the pangenome generalized to an additional 29 lung adenocarcinoma tumours, particularly for individuals of East Asian ancestry. These findings support adoption of the pangenome to improve somatic variant detection and reduce ancestry-related disparities.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
18.5%
2
Nature Communications
4913 papers in training set
Top 8%
17.4%
3
Cell Genomics
162 papers in training set
Top 0.2%
9.1%
4
Genome Biology
555 papers in training set
Top 1%
6.3%
50% of probability mass above
5
Nucleic Acids Research
1128 papers in training set
Top 4%
4.8%
6
Nature Biotechnology
147 papers in training set
Top 2%
4.3%
7
Scientific Reports
3102 papers in training set
Top 37%
3.6%
8
Bioinformatics
1061 papers in training set
Top 6%
2.6%
9
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.4%
10
Communications Biology
886 papers in training set
Top 6%
1.9%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
12
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
15
PLOS ONE
4510 papers in training set
Top 60%
1.2%
16
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
17
Microbial Genomics
204 papers in training set
Top 2%
0.9%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
19
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
20
Cell Systems
167 papers in training set
Top 13%
0.7%
21
Cell
370 papers in training set
Top 19%
0.6%
22
Genome Research
409 papers in training set
Top 5%
0.6%