Back

A New Paralog Removal Pipeline Resolves Conflict between RAD-seq and Enrichment

Zhou, W.; Soghigian, J.; Xiang, Q. J.

2020-10-27 evolutionary biology
10.1101/2020.10.26.355248 bioRxiv
Show abstract

Target enrichment and RAD-seq are well-established high throughput sequencing technologies that have been increasingly used for phylogenomic studies, and the choice between methods is a practical issue for plant systematists studying the evolutionary histories of biodiversity of relatively recent origins. However, few studies have compared the congruence and conflict between results from the two methods within the same group of organisms, especially in plants, where extensive genome duplication events may complicate phylogenomic analyses. Unfortunately, currently widely used pipelines for target enrichment data analysis do not have a vigorous procedure for remove paralogs in Hyb-Seq data. In this study, we employed RAD-seq and Hyb-Seq of Angiosperm 353 genes in phylogenomic and biogeographic studies of Hamamelis (the witch-hazels) and Castanea (chestnuts), two classic examples exhibiting the well-known eastern Asian-eastern North American disjunct distribution. We compared these two methods side by side and developed a new pipeline (PPD) with a more vigorous removal of putative paralogs from Hyb-Seq data. The new pipeline considers both sequence similarity and heterozygous sites at each locus in identification of paralogous. We used our pipeline to construct robust datasets for comparison between methods and downstream analyses on the two genera. Our results demonstrated that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed significant differences between data from HybPiper and data from our new PPD pipeline, likely due to the error signals from the paralogous genes undetected by HybPiper, but trimmed by PPD. We found that phylogenies and divergence times estimated from our RAD-seq and Hyb-Seq-PPD were largely congruent. We highlight the importance of removal paralogs in enrichment data, and discuss the merits of RAD-seq and Hyb-Seq. Finally, phylogenetic analyses of RAD-seq and Hyb-Seq resulted in well-resolved species relationships, and revealed ancient introgression in both genera. Biogeographic analyses including fossil data revealed a complicated history of each genus involving multiple intercontinental dispersals and local extinctions in areas outside of the taxas modern ranges in both the Paleogene and Neogene. Our study demonstrates the value of additional steps for filtering paralogous gene content from Angiosperm 353 data, such as our new PPD pipeline described in this study. [RAD-seq, Hyb-Seq, paralogs, Castanea, Hamamelis, eastern Asia-eastern North America disjunction, biogeography, ancient introgression]

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of Systematics and Evolution
11 papers in training set
Top 0.1%
28.4%
2
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.1%
10.3%
3
Journal of Genetics and Genomics
36 papers in training set
Top 0.2%
5.0%
4
New Phytologist
309 papers in training set
Top 1%
5.0%
5
The Plant Journal
197 papers in training set
Top 1%
4.1%
50% of probability mass above
6
BMC Ecology and Evolution
49 papers in training set
Top 0.4%
3.7%
7
Horticulture Research
43 papers in training set
Top 0.5%
3.7%
8
Systematic Biology
121 papers in training set
Top 0.2%
3.7%
9
PLOS ONE
4510 papers in training set
Top 38%
3.7%
10
eLife
5422 papers in training set
Top 33%
2.4%
11
Ecology and Evolution
232 papers in training set
Top 2%
2.1%
12
Scientific Reports
3102 papers in training set
Top 57%
1.7%
13
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.7%
14
Molecular Biology and Evolution
488 papers in training set
Top 2%
1.7%
15
Molecular Ecology Resources
161 papers in training set
Top 0.6%
1.5%
16
Applications in Plant Sciences
21 papers in training set
Top 0.2%
1.4%
17
Annals of Botany
43 papers in training set
Top 0.3%
1.3%
18
American Journal of Botany
41 papers in training set
Top 0.3%
0.9%
19
PeerJ
261 papers in training set
Top 12%
0.9%
20
Journal of Biogeography
37 papers in training set
Top 0.2%
0.9%
21
Frontiers in Plant Science
240 papers in training set
Top 5%
0.9%
22
BMC Genomics
328 papers in training set
Top 5%
0.8%
23
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.8%
24
Plant Communications
35 papers in training set
Top 1%
0.8%
25
Molecular Ecology
304 papers in training set
Top 5%
0.7%
26
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.5%
27
BMC Plant Biology
47 papers in training set
Top 1%
0.5%
28
iScience
1063 papers in training set
Top 40%
0.5%
29
Plant Direct
81 papers in training set
Top 2%
0.5%