Back

Improving Local Ancestry Inference through Neural Networks

Medina Tretmanis, J.; Avila-Arcos, M. C.; Jay, F.; Huerta-Sanchez, E.

2026-03-13 bioinformatics
10.64898/2026.03.11.711082 bioRxiv
Show abstract

MotivationLocal Ancestry Inference (LAI) allows us to study evolutionary processes in admixed populations[1], uncover ancestry-specific disease risk factors[2], and to better understand the demographic history of these populations[3]. Many methods for LAI exist, however, these methods usually focus on cases of intercontinental admixture. In this work, we evaluate both existing and novel methods in challenging scenarios, such as downsampled reference panels, intracontinental admixture, and distant admixture events. ResultsWe present four novel LAI implementations based on neural network architectures, including Bidirectional Long Short-Term Memory and Transformer networks which have not previously been used for LAI. We compare these novel implementations to existing methods for LAI across a variety of scenarios using the 1 Thousand Genomes dataset and other synthetic datasets. We find that while all networks achieve high performance for intercontinental admixture scenarios, inference power is comparatively low for scenarios of intracontinental or distant admixture. We further show how our implementations achieve the best performance of all methods through specialized preprocessing and inference smoothing steps. AvailabilityAll implementations and benchmarking code available at https://github.com/Jazpy/LAINNs.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.4%
40.2%
2
Bioinformatics Advances
184 papers in training set
Top 0.1%
10.3%
50% of probability mass above
3
BMC Bioinformatics
383 papers in training set
Top 1%
7.3%
4
PLOS Genetics
756 papers in training set
Top 3%
4.9%
5
The American Journal of Human Genetics
206 papers in training set
Top 0.9%
4.9%
6
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.7%
7
Nature Communications
4913 papers in training set
Top 50%
1.7%
8
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
9
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
10
Genome Biology
555 papers in training set
Top 5%
1.5%
11
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.6%
1.2%
12
Scientific Reports
3102 papers in training set
Top 68%
1.1%
13
Nature Computational Science
50 papers in training set
Top 1%
1.0%
14
Human Genetics and Genomics Advances
70 papers in training set
Top 0.6%
0.9%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
16
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.9%
17
Genome Research
409 papers in training set
Top 4%
0.8%
18
Communications Biology
886 papers in training set
Top 23%
0.8%
19
Genetics
225 papers in training set
Top 4%
0.8%
20
PLOS ONE
4510 papers in training set
Top 67%
0.8%
21
GENETICS
189 papers in training set
Top 1%
0.8%
22
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.7%
23
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
24
Forensic Science International: Genetics
24 papers in training set
Top 0.1%
0.7%
25
Peer Community Journal
254 papers in training set
Top 4%
0.7%
26
Nature Genetics
240 papers in training set
Top 8%
0.7%
27
Cell Genomics
162 papers in training set
Top 7%
0.7%