Back

A novel data filtering method resolves the controversy in the phylogeny of the Chondrichthyes

Huang, J.; Hofreiter, M.; Noble, L. R.; Straube, N.; Naylor, G. J. P.; Li, C.

2025-08-23 evolutionary biology
10.1101/2025.08.19.671162 bioRxiv
Show abstract

Phylogenomics, which uses genome-scale data for phylogenetic inference, has clarified many controversial nodes in the tree of life. Such extensive data improve tree resolution and better reflects organismal history compared to analyses based on single or a few genetic loci. However, some relationships within the tree of life remain unresolved, as increased data can yield high node support without ensuring accuracy due to systematic errors. For example, the order-level relationships among chondrichthyans are still contentious despite the use of phylogenomic data. To address systematic errors, complex models have been developed, and filtering for less erroneous data shows great promise. Current metric-based filtering methods rank loci based on overall tree statistics, but problematic signals are often local; and topology-based data filtering approaches struggle with circular assumptions. In this study, we introduced two novel metric-based data filtering methods based on the ratio of local branch length or GC content between problematic clades. We applied these methods to a dataset of 4,452 single-copy exons extracted from 98 chondrichthyan species. The results using all loci showed that the Hexanchiformes was positioned at the root of Elasmobranchii, pulling other squalomorphs to the basal position and rendering Squalomorphii paraphyletic. Contrastingly, filtering for loci with more even branch length, the branch ratio method (absRatioLen) strongly supported the monophyly of all superorders of the chondrichthyans as well as their higher classification grouping, such as Selachii and Batoidea. By concentrating on problematic nodes, our assumption-free filtering methods demonstrate significant potential in resolving contentious relationships in the tree of life.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.1%
22.4%
2
Systematic Biology
121 papers in training set
Top 0.1%
14.7%
3
BMC Ecology and Evolution
49 papers in training set
Top 0.1%
6.8%
4
Methods in Ecology and Evolution
160 papers in training set
Top 0.6%
6.3%
50% of probability mass above
5
Journal of Systematics and Evolution
11 papers in training set
Top 0.1%
4.8%
6
eLife
5422 papers in training set
Top 26%
3.6%
7
Scientific Reports
3102 papers in training set
Top 37%
3.6%
8
Molecular Biology and Evolution
488 papers in training set
Top 2%
3.1%
9
Bioinformatics
1061 papers in training set
Top 6%
2.1%
10
Molecular Ecology Resources
161 papers in training set
Top 0.5%
1.9%
11
BMC Genomics
328 papers in training set
Top 2%
1.9%
12
Communications Biology
886 papers in training set
Top 7%
1.8%
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
PLOS ONE
4510 papers in training set
Top 54%
1.7%
15
PeerJ
261 papers in training set
Top 7%
1.7%
16
Journal of Computational Biology
37 papers in training set
Top 0.2%
1.7%
17
Systematic Entomology
11 papers in training set
Top 0.1%
1.7%
18
Ecology and Evolution
232 papers in training set
Top 3%
0.9%
19
PLOS Biology
408 papers in training set
Top 18%
0.8%
20
Genome Biology and Evolution
280 papers in training set
Top 2%
0.7%
21
New Phytologist
309 papers in training set
Top 5%
0.7%
22
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.7%
23
Journal of Molecular Evolution
21 papers in training set
Top 0.4%
0.7%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 46%
0.7%
25
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%