Decontaminating genomic data for accurate species delineation and hybrid detection in the Lasius ant genus
Jecha, K.; Lavanchy, G.; Schwander, T.
Show abstract
Advancements in genetic technologies have allowed us to generate large data sets relatively quickly and easily. However, without proper quality control checks, the inferences drawn from such data can be erroneous and go on to misinform further studies. DNA contamination between focal samples of the same or closely related species can have major impacts on downstream analyses, but their presence is seldom tested. Here, we created a pipeline combining competitive mapping to remove reads from intergeneric contamination, followed by a filtering method using allelic depth ratio frequencies to exclude intrageneric contamination. We then used a RADseq dataset of over 1,000 Swiss Lasius ants that were cross contaminated to various levels prior to sequencing to assess the impact of contamination on inferences of introgression. The original dataset presented widespread introgression between species in which hybridization has never been recorded. After thorough decontamination, we found only one individual with a strong signature of introgression, between the species L. emarginatus and L. platythorax, revealing that introgression is extremely rare in this genus. Implementing our method of filtering can significantly improve the robustness of biological findings based on genomic datasets. We recommend that systematically checking for the presence of cross contamination should be a key step in the preprocessing of genomic datasets.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.