Back

Rapid gene exchange explains differences in bacterial pangenome structure

Horsfield, S. T.; Peng, A.; Russell, M. J.; von Wachsmann, J.; Toussaint, J.; D'Aeth, J. C.; Qin, C.; Pesonen, H.; Tonkin-Hill, G.; Corander, J.; Croucher, N. J.; Lees, J. A.

2026-02-06 bioinformatics
10.64898/2026.02.04.703695 bioRxiv
Show abstract

The size and diversity of bacterial gene repertoires, known as pangenomes, vary widely across species. The evolutionary forces driving the maintenance of pangenomes is an open topic of debate, with contradictory theories suggesting that pangenomes exist as a result of neutral evolution, with all genes gained and lost at random, or that all genes provide a fitness benefit to the host and are maintained by positive selection. Modelling of pangenome dynamics has provided insight into how gene exchange explains observed gene frequency distributions, and stands as the only means of jointly inferring contributions of individual gene selection effects and mobility on the maintenance of pangenomes. However, previous modelling studies have not included both gene-level selection and mobility, and do not consider broadly sampled genome datasets for many species. To differentiate neutral and selective forces maintaining pangenomes, we developed a mechanistic model of gene-level evolution, Pansim, and a scalable model fitting framework, PopPUNK-mod. Together, these tools leverage rapid genome distance calculation to fit models of pangenome dynamics to datasets containing hundreds of thousands of genomes. We used this framework to compare the pangenome dynamics of over 400 different bacterial species, using over 600,000 genomes. We find that diversity in pangenome characteristics between species is driven predominantly by variation in the number of rapidly exchanged genes, while the rate of exchange of remaining genes is conserved. We find that bacterial phylogeny, rather than ecology, correlates with pangenome dynamics. We express that pan-species gene-level analyses are now needed to understand selection across accessory genes. Our work highlights the importance of gene exchange rate differences in governing differences in pangenome characteristics between species.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Molecular Biology and Evolution
488 papers in training set
Top 0.1%
22.3%
2
PLOS Computational Biology
1633 papers in training set
Top 2%
12.2%
3
Cell Systems
167 papers in training set
Top 0.8%
12.2%
4
Genome Biology
555 papers in training set
Top 1%
4.8%
50% of probability mass above
5
Genetics
225 papers in training set
Top 1%
3.9%
6
mSystems
361 papers in training set
Top 2%
3.9%
7
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
8
Nature Communications
4913 papers in training set
Top 40%
3.6%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.9%
10
Virus Evolution
140 papers in training set
Top 0.5%
2.7%
11
eLife
5422 papers in training set
Top 36%
2.1%
12
Genome Biology and Evolution
280 papers in training set
Top 1%
1.7%
13
Science Advances
1098 papers in training set
Top 19%
1.6%
14
PLOS Genetics
756 papers in training set
Top 10%
1.5%
15
Cell Host & Microbe
113 papers in training set
Top 4%
1.2%
16
iScience
1063 papers in training set
Top 25%
0.9%
17
Cell Genomics
162 papers in training set
Top 5%
0.9%
18
Science
429 papers in training set
Top 18%
0.9%
19
Biophysical Journal
545 papers in training set
Top 5%
0.8%
20
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
21
Cell Reports
1338 papers in training set
Top 33%
0.7%
22
Microbial Genomics
204 papers in training set
Top 2%
0.7%
23
Genome Research
409 papers in training set
Top 4%
0.7%
24
PLOS Pathogens
721 papers in training set
Top 9%
0.7%
25
Communications Biology
886 papers in training set
Top 27%
0.7%
26
Nature Biotechnology
147 papers in training set
Top 8%
0.6%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%
28
Bioinformatics
1061 papers in training set
Top 10%
0.6%