Back

Benchmarking of local ancestry inference with different assays and parameters

Motegi, T.; Huang, F.; Campbell, J. D.

2026-05-21 genomics
10.64898/2026.05.18.726085 bioRxiv
Show abstract

Local ancestry inference (LAI) enables high-resolution characterization of chromosomal segments inherited from distinct ancestral populations, offering unique insights into genetic architecture in admixed cohorts. While LAI is commonly performed with high-coverage whole-genome sequencing (WGS), the ability of other genotyping assays or varying sequencing depths has not been thoroughly benchmarked. In this study, we systematically evaluated the accuracy of LAI across SNP microarrays, whole-exome sequencing (WES), and ultra low-pass WGS (ULP-WGS) using diverse validation samples and state-of-the-art imputation pipelines. We show that ULP-WGS, when paired with GLIMPSE2, achieves robust accuracy at 0.25x coverage with a minimum genome window size of 0.5 centimorgans, with mean accuracy minus one standard deviation exceeding 95%. For WES, using "on-target" reads alone yields suboptimal performance, particularly for European and South Asian ancestries with accuracy less than 79.1% and 70.6%, respectively. However, incorporating "off-target" reads in WES and utilizing GLIMPSE2 substantially improved accuracy [≥]95% with a minimum window size of 0.2 centimorgans. We further evaluated formalin-fixed, paraffin-embedded (FFPE) samples and found that LAI could be performed successfully using WES data with accuracies of [≥]95% at a minimum window size of 0.5 centimorgans. In contrast, SNP microarrays did not achieve substantial accuracies at any window size ([≤]95%). Together, these results demonstrate that LAI is achievable without conventional high-coverage WGS and establish optimal parameters for LAI across platforms.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
22.9%
2
Cell Genomics
162 papers in training set
Top 0.1%
17.8%
3
Genome Medicine
154 papers in training set
Top 0.7%
7.3%
4
Nature Communications
4913 papers in training set
Top 26%
6.9%
50% of probability mass above
5
Genome Biology
555 papers in training set
Top 2%
4.4%
6
Nature Genetics
240 papers in training set
Top 2%
4.2%
7
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
3.1%
8
Cell
370 papers in training set
Top 8%
2.8%
9
Scientific Reports
3102 papers in training set
Top 49%
2.1%
10
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
11
eLife
5422 papers in training set
Top 37%
1.9%
12
Frontiers in Genetics
197 papers in training set
Top 4%
1.8%
13
Genome Research
409 papers in training set
Top 2%
1.7%
14
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
15
Human Molecular Genetics
130 papers in training set
Top 3%
1.0%
16
Communications Biology
886 papers in training set
Top 16%
1.0%
17
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 43%
0.8%
19
Science
429 papers in training set
Top 19%
0.8%
20
Human Genomics
21 papers in training set
Top 0.3%
0.8%
21
Bioinformatics
1061 papers in training set
Top 10%
0.7%
22
Cell Reports
1338 papers in training set
Top 35%
0.7%
23
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%
24
Nature Biotechnology
147 papers in training set
Top 9%
0.5%
25
Nature
575 papers in training set
Top 18%
0.5%
26
Molecular Ecology Resources
161 papers in training set
Top 1%
0.5%
27
PLOS Genetics
756 papers in training set
Top 18%
0.5%