Back

Enrichment Probe Sets Combining Universal and Lineage-Specific Targets Help Resolve Recalcitrant Lineages

Villa-Machio, I.; Masa-Iranzo, I.; Nürk, N. M.; Pokorny, L.; Meseguer, A. S.

2026-03-25 evolutionary biology
10.64898/2026.03.24.713849 bioRxiv
Show abstract

The combination of target capture sequencing (TCS) with low-coverage whole genome sequencing (lcWGS), an approach known as Hyb-Seq, has allowed the integration of natural history collections into the genomics revolution, transforming biodiversity research. To implement Hyb-Seq, a collection of genomic targets, often nuclear orthologs, is needed to design probes for TCS. In flowering plants, the universal Angiosperms353 probe set has been proven resolutive at multiple evolutionary scales, with caveats. Malpighiales is known to be one of the most challenging flowering plant orders to resolve. Within this order, the clusioid clade ([~]2.2K species, 94 genera, five families) is no exception. To resolve phylogenetic relationships in this recalcitrant clade, we design a custom probe set, the Clusioids626 kit, composed of 39,936 120-mer probes targeting 626 nuclear orthologs ([~]6.6M nucleotides). This probe set includes all Angiosperms353 targets and 273 clusioid-specific ones, carefully chosen taking copy-number, length evenness, and phylo-informativeness into account. We test our probe set on 70 accessions representing all families and tribes in the clusioid clade. On average, 50.4% of TCS reads mapped to our targets, recovering a median of [~]600 orthologs. Relationships for all clusioid families are fully resolved for our nuclear targets. Additionally, 105 plastid coding DNA sequences were retrieved from the lcWGS fraction. A strong cyto-nuclear conflict was detected. The Clusioids626 kit performs better than the universal Angiosperms353 enrichment panel alone. Our kit design workflow can be extended into other lineages for which a universal probe set exists but more resolution is needed.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Plant Communications
35 papers in training set
Top 0.1%
14.5%
2
The Plant Journal
197 papers in training set
Top 0.2%
12.4%
3
Nature Communications
4913 papers in training set
Top 25%
7.1%
4
New Phytologist
309 papers in training set
Top 1%
6.3%
5
Molecular Ecology Resources
161 papers in training set
Top 0.3%
3.9%
6
Nature Plants
84 papers in training set
Top 0.5%
3.6%
7
Genome Biology
555 papers in training set
Top 2%
3.6%
50% of probability mass above
8
Applications in Plant Sciences
21 papers in training set
Top 0.1%
3.5%
9
Horticulture Research
43 papers in training set
Top 0.6%
3.0%
10
The Plant Cell
141 papers in training set
Top 1%
2.6%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 28%
2.1%
12
eLife
5422 papers in training set
Top 36%
2.1%
13
Molecular Plant
36 papers in training set
Top 0.6%
2.1%
14
Communications Biology
886 papers in training set
Top 7%
1.9%
15
Bioinformatics
1061 papers in training set
Top 7%
1.9%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
17
Molecular Biology and Evolution
488 papers in training set
Top 2%
1.8%
18
Journal of Systematics and Evolution
11 papers in training set
Top 0.1%
1.8%
19
Frontiers in Plant Science
240 papers in training set
Top 4%
1.7%
20
Science
429 papers in training set
Top 14%
1.7%
21
BMC Genomics
328 papers in training set
Top 3%
1.5%
22
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.5%
23
Nature Genetics
240 papers in training set
Top 5%
1.3%
24
Scientific Reports
3102 papers in training set
Top 67%
1.2%
25
Genome Research
409 papers in training set
Top 4%
0.9%
26
Systematic Biology
121 papers in training set
Top 0.4%
0.9%
27
Scientific Data
174 papers in training set
Top 2%
0.8%
28
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
29
PLOS ONE
4510 papers in training set
Top 68%
0.7%
30
Current Biology
596 papers in training set
Top 15%
0.6%