Back

Attentive-SPIDNA: Attention-based neural networks for population genetics

Sanchez, T.; Jobic, P.; Regan, C.; Verdu, P.; Charpiat, G.; Jay, F.

2026-04-18 evolutionary biology
10.64898/2026.04.15.718687 bioRxiv
Show abstract

Artificial neural networks (ANNs) have recently offered new perspectives to solve inference problems from high dimensional data in numerous scientific fields, but it is yet unclear which architectures are the most suited to genomic data. Here, we present a new ANN architecture integrating attention mechanisms to infer effective population size history from genomic data. Built upon our previous exchangeable architecture SPIDNA, Attentive-SPIDNA adds attention layers that allow computing more expressive and complex features from combinations of haplotypes. The contribution of each haplotype to the features is learned automatically and depends on its content and affinity with the other haplotypes. Likewise, we use this mechanism to automatically perform a voting scheme that aggregates predictions from different genomic regions. This new architecture outperforms approximate Bayesian computation and previously published neural networks while relying directly on raw genetic data and being invariant to haplotype permutation in the input. As a proof-of-concept, we use this architecture to infer the effective population size history of 54 populations from the HGDP dataset (Bergstrom et al, 2020). This application highlights the ability of the network to handle data with a varying number of haplotypes and to quickly perform predictions for datasets including numerous populations. Therefore, the proposed mechanism could be integrated to various neural networks solving population genetics tasks.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.2%
2
Bioinformatics Advances
184 papers in training set
Top 0.1%
12.1%
3
Molecular Biology and Evolution
488 papers in training set
Top 0.3%
10.2%
4
Frontiers in Genetics
197 papers in training set
Top 0.7%
6.7%
5
Genome Research
409 papers in training set
Top 0.4%
6.2%
50% of probability mass above
6
Genome Biology and Evolution
280 papers in training set
Top 0.3%
4.8%
7
BMC Genomics
328 papers in training set
Top 0.7%
3.9%
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
9
GENETICS
189 papers in training set
Top 0.3%
3.5%
10
PLOS Genetics
756 papers in training set
Top 6%
2.7%
11
Genetics
225 papers in training set
Top 2%
2.3%
12
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
2.0%
13
European Journal of Human Genetics
49 papers in training set
Top 0.5%
2.0%
14
Molecular Ecology Resources
161 papers in training set
Top 0.5%
1.8%
15
eLife
5422 papers in training set
Top 43%
1.7%
16
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.5%
1.5%
17
Nature Communications
4913 papers in training set
Top 54%
1.5%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
19
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.9%
20
Nature Computational Science
50 papers in training set
Top 1%
0.9%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
22
Communications Biology
886 papers in training set
Top 27%
0.7%
23
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%