Back

How many are you? Open data and bioinformatics reveal species misidentification and potential introgression in Chordodes (Phylum Nematomorpha)

De Vivo, M.

2026-02-05 bioinformatics
10.64898/2026.02.03.703548 bioRxiv
Show abstract

The potential usage of genomic open data can help us to understand patterns in biodiversity. They can also be helpful for identifying morphologically similar species. An example of taxon in which this can be useful is Nematomorpha, one of the less studied animal phyla, for which data has started to be available recently and where species identification can be hard. In this study, I planned initially to evaluate the usage of mitochondrial data for population analyses using an RNA sequencing (RNA-seq) dataset labelled as belonging to Chordodes fukuii. After surprising results using extracted sequences from the barcoding gene cytochrome c oxidase subunit I (COXI), I evaluated species delimitation using a mix of a previously released double-digest restriction-site-associated DNA sequencing (ddRADseq) SRA dataset plus the RNA-seq one. PCA, R analyses through "adegenet" and ADMIXTURE confirmed the presence of two species in the RNA-seq dataset, which should be labelled as C. formosanus and C. japonensis; however, some individuals labelled as C. japonensis according to COXI clustered with C. formosanuss specimens or had some C. formosanus ancestry when more data was used, indicating potential introgression or incomplete lineage sorting. The study shows how previously released data can be used for evaluating species delimitation, potential previous demographic events and potential needs in DNA barcoding and genomics for avoiding future misidentification of morphologically similar species.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 2%
34.2%
2
PeerJ
261 papers in training set
Top 0.1%
12.8%
3
Gigabyte
60 papers in training set
Top 0.2%
5.0%
50% of probability mass above
4
Scientific Reports
3102 papers in training set
Top 33%
3.8%
5
Animals
20 papers in training set
Top 0.2%
2.7%
6
Ecology and Evolution
232 papers in training set
Top 2%
2.4%
7
F1000Research
79 papers in training set
Top 1%
2.0%
8
BMC Bioinformatics
383 papers in training set
Top 4%
2.0%
9
Genes
126 papers in training set
Top 0.8%
1.8%
10
Ecological Informatics
29 papers in training set
Top 0.3%
1.8%
11
Biology Methods and Protocols
53 papers in training set
Top 1%
1.3%
12
PLOS Neglected Tropical Diseases
378 papers in training set
Top 4%
1.3%
13
Insects
36 papers in training set
Top 0.8%
1.0%
14
Systematic Entomology
11 papers in training set
Top 0.1%
0.9%
15
Biology
43 papers in training set
Top 2%
0.9%
16
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
17
Heliyon
146 papers in training set
Top 5%
0.8%
18
BMC Genomics
328 papers in training set
Top 5%
0.8%
19
Journal of Computational Biology
37 papers in training set
Top 0.5%
0.8%
20
Gene
41 papers in training set
Top 2%
0.8%
21
GigaScience
172 papers in training set
Top 3%
0.8%
22
Parasites & Vectors
57 papers in training set
Top 1%
0.8%
23
BMC Ecology and Evolution
49 papers in training set
Top 2%
0.8%
24
Genomics
60 papers in training set
Top 3%
0.7%
25
Environmental DNA
49 papers in training set
Top 0.3%
0.7%
26
Metabarcoding and Metagenomics
12 papers in training set
Top 0.1%
0.7%
27
Peer Community Journal
254 papers in training set
Top 4%
0.7%
28
Frontiers in Veterinary Science
30 papers in training set
Top 1%
0.5%
29
Plants
39 papers in training set
Top 2%
0.5%
30
Molecular Ecology Resources
161 papers in training set
Top 1%
0.5%