Back

ArchaicSeeker 3.0: A deep-learning framework for scalable, haplotype-resolved inference of archaic introgression

Wang, B.; Lei, C.; Lin, H.; Shi, S.; Ma, X.; Zeng, W.; Yuan, K.; Ni, X.; Xu, S.

2026-05-06 bioinformatics
10.64898/2026.05.05.722798 bioRxiv
Show abstract

Archaic introgression has left a significant mark on human genetic diversity, but reliably identifying introgressed segments remains a major challenge, especially with complex demographic histories and limited sample sizes. Existing methods often rely on demographic assumptions or cohort-specific parameter fitting, which compromises robustness and scalability. We introduce ArchaicSeeker 3.0 (AS3), a deep-learning framework designed for haplotype-resolved detection of archaic introgression. AS3 integrates a tract-scale sequence model with an overlap-aware reassembly approach and boundary refinement, enabling accurate, boundary-coherent reconstruction of introgressed segments across diverse genomic contexts. By leveraging a simulation-trained model, AS3 avoids inference-time recalibration, offering stable performance across unrepresented demographic scenarios and small cohorts. In extensive simulations, AS3 outperforms existing methods in precision, recall, and F1 score, while providing more continuous segments with accurate boundary localization. It demonstrates robustness in small-target regimes and varying marker densities. Applied to 3,453 genomes from 209 populations, AS3 shows strong concordance with existing introgression callers and identifies additional introgressed regions, including high-frequency AS3-specific introgressed segments supported by locus-level haplotype and phylogenetic analyses. AS3 provides a scalable, robust solution for detecting archaic introgression from single individuals to large biobank datasets, marking a significant advancement in the field of local ancestry inference and opening new possibilities for the study of human evolutionary genetics. ArchaicSeeker 3.0 is available at https://github.com/Shuhua-Group/ArchaicSeeker3.0.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature
575 papers in training set
Top 3%
9.9%
2
Nature Communications
4913 papers in training set
Top 20%
9.9%
3
Science
429 papers in training set
Top 3%
9.9%
4
Nature Methods
336 papers in training set
Top 1%
9.9%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.7%
6
Nature Genetics
240 papers in training set
Top 1%
6.2%
50% of probability mass above
7
Nature Biotechnology
147 papers in training set
Top 2%
6.2%
8
Genome Biology
555 papers in training set
Top 2%
4.7%
9
Nucleic Acids Research
1128 papers in training set
Top 5%
4.1%
10
Genome Medicine
154 papers in training set
Top 2%
3.5%
11
Genome Research
409 papers in training set
Top 1%
2.7%
12
Bioinformatics
1061 papers in training set
Top 6%
2.5%
13
Cell Systems
167 papers in training set
Top 5%
2.5%
14
Nature Computational Science
50 papers in training set
Top 0.7%
1.7%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
16
Cell Genomics
162 papers in training set
Top 4%
1.7%
17
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
18
Advanced Science
249 papers in training set
Top 16%
0.9%
19
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
21
Scientific Reports
3102 papers in training set
Top 73%
0.8%
22
Cell
370 papers in training set
Top 17%
0.8%
23
Nature Medicine
117 papers in training set
Top 5%
0.7%
24
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
25
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
26
Nature Machine Intelligence
61 papers in training set
Top 4%
0.6%
27
Science Advances
1098 papers in training set
Top 34%
0.6%
28
Molecular Biology and Evolution
488 papers in training set
Top 5%
0.6%