Back

Reference genome choice impacts SNP recovery but not evolutionary inference in young species

Soares, L. S.; Goncalves, L. T.; Guzman-Rodriguez, S.; Bombarely, A.; Freitas, L. B.

2026-02-06 bioinformatics
10.64898/2026.02.04.703758 bioRxiv
Show abstract

Reduced-representation sequencing approaches such as RAD-seq are widely used in population genomics and phylogenetics, particularly for non-model organisms. However, bioinformatics choices during data processing can strongly influence downstream analyses. One key but underexplored factor is the reference genome used for read alignment and SNP discovery. Here, we evaluate the effects of reference genome choice on RAD-seq analyses using multiple datasets spanning recent radiations in Petunia and Calibrachoa, and reference genomes that differ in phylogenetic relatedness. When using congeneric reference genomes, we observed highly consistent mapping rates, SNP recovery, and downstream population genomic patterns. In contrast, mapping to more distantly related genomes resulted in lower mapping rates and stronger effects on summary statistics. Despite these quantitative reductions, broader patterns of genetic structure and diversity, as well as evolutionary relationships, remained largely congruent across reference genomes. Overall, our results indicate that reference genome choice matters most when genomes are distantly related or when analyses target fine-scale genomic signals. For recent radiations with largely conserved genome structure, closely related reference genomes yield comparable SNP datasets and lead to the same biological conclusions regarding population structure and phylogenetic relationships. These findings provide practical guidance for RAD-seq studies in non-model systems, showing that congeneric reference genomes are sufficient for robust population and phylogenetic inference, and that more distantly related genomes can remain informative when no close reference is available.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Molecular Ecology Resources
161 papers in training set
Top 0.1%
18.7%
2
Molecular Ecology
304 papers in training set
Top 0.5%
12.5%
3
New Phytologist
309 papers in training set
Top 0.5%
10.1%
4
Systematic Biology
121 papers in training set
Top 0.1%
10.1%
50% of probability mass above
5
Molecular Biology and Evolution
488 papers in training set
Top 0.6%
7.2%
6
Nature Communications
4913 papers in training set
Top 35%
4.3%
7
Methods in Ecology and Evolution
160 papers in training set
Top 0.7%
4.2%
8
Genome Biology and Evolution
280 papers in training set
Top 0.6%
3.1%
9
PLOS Genetics
756 papers in training set
Top 6%
2.6%
10
Scientific Reports
3102 papers in training set
Top 50%
2.1%
11
BMC Genomics
328 papers in training set
Top 2%
2.1%
12
Genome Biology
555 papers in training set
Top 5%
1.5%
13
Peer Community Journal
254 papers in training set
Top 2%
1.3%
14
BMC Biology
248 papers in training set
Top 2%
1.2%
15
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.3%
1.0%
16
Heredity
53 papers in training set
Top 0.2%
0.9%
17
eLife
5422 papers in training set
Top 55%
0.8%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
19
Genome Research
409 papers in training set
Top 4%
0.7%
20
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
21
Evolution
199 papers in training set
Top 2%
0.6%
22
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%
23
PeerJ
261 papers in training set
Top 17%
0.6%