Back

Decontaminating genomic data for accurate species delineation and hybrid detection in the Lasius ant genus

Jecha, K.; Lavanchy, G.; Schwander, T.

2024-12-02 genomics
10.1101/2024.11.27.625433 bioRxiv
Show abstract

Advancements in genetic technologies have allowed us to generate large data sets relatively quickly and easily. However, without proper quality control checks, the inferences drawn from such data can be erroneous and go on to misinform further studies. DNA contamination between focal samples of the same or closely related species can have major impacts on downstream analyses, but their presence is seldom tested. Here, we created a pipeline combining competitive mapping to remove reads from intergeneric contamination, followed by a filtering method using allelic depth ratio frequencies to exclude intrageneric contamination. We then used a RADseq dataset of over 1,000 Swiss Lasius ants that were cross contaminated to various levels prior to sequencing to assess the impact of contamination on inferences of introgression. The original dataset presented widespread introgression between species in which hybridization has never been recorded. After thorough decontamination, we found only one individual with a strong signature of introgression, between the species L. emarginatus and L. platythorax, revealing that introgression is extremely rare in this genus. Implementing our method of filtering can significantly improve the robustness of biological findings based on genomic datasets. We recommend that systematically checking for the presence of cross contamination should be a key step in the preprocessing of genomic datasets.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Molecular Ecology Resources
161 papers in training set
Top 0.1%
41.4%
2
Molecular Ecology
304 papers in training set
Top 0.4%
15.0%
50% of probability mass above
3
PLOS ONE
4510 papers in training set
Top 30%
5.1%
4
PeerJ
261 papers in training set
Top 2%
3.7%
5
Scientific Reports
3102 papers in training set
Top 42%
3.0%
6
BMC Genomics
328 papers in training set
Top 2%
2.0%
7
Applications in Plant Sciences
21 papers in training set
Top 0.1%
1.7%
8
G3
33 papers in training set
Top 0.3%
1.4%
9
BMC Biology
248 papers in training set
Top 2%
1.3%
10
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.3%
11
Journal of Heredity
35 papers in training set
Top 0.1%
1.2%
12
DNA Research
23 papers in training set
Top 0.4%
1.0%
13
Genes
126 papers in training set
Top 2%
1.0%
14
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.0%
15
Peer Community Journal
254 papers in training set
Top 3%
0.9%
16
Frontiers in Marine Science
55 papers in training set
Top 0.9%
0.9%
17
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
18
Frontiers in Plant Science
240 papers in training set
Top 5%
0.8%
19
Ecology and Evolution
232 papers in training set
Top 4%
0.8%
20
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
21
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.7%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%
25
Aquaculture
29 papers in training set
Top 0.8%
0.5%
26
Environmental DNA
49 papers in training set
Top 0.4%
0.5%
27
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%