Back

State Space Misspecification in Morphological Phylogenetics: A Pitfall for Models and Parsimony Alike

Huang, E.

2025-04-26 evolutionary biology
10.1101/2025.04.22.650124 bioRxiv
Show abstract

Phylogenetic analysis relies on two fundamental levels of biological information: genotype and phenotype. Molecular data benefit from operating within a well-defined, finite state space (e.g., nucleotide alphabets), whereas morphological data present inherent challenges due to frequently ambiguous character states and variable state counts. In this study, I use simulated data to examine how state space misspecification (SSM), defined as the mismatch between the assumed and true state space, affects phylogenetic reconstruction. Results show that SSM generally reduces topological accuracy, with the extent of its impact depending on mutation rate, state space disparity, and the proportion of affected characters. Counterintuitively, under conditions typical of empirical morphological datasets (high proportions of binary characters and elevated mutation rates), SSM can improve topological precision. This creates a paradox where an incorrect model outperforms a correct one, though at the cost of distorted branch lengths. Importantly, the effects of SSM extend beyond model-based approaches. I demonstrate, through an extension of the no common mechanism (NCM) model, that standard maximum parsimony is consistent with the assumption that characters evolved under an SSM model--a largely overlooked feature. To address this, I propose a state-space-aware weighting scheme that accounts for variation in character state space. I also discuss additional strategies for mitigating SSM, including model adjustments and reducing reliance on oversimplified binary coding. This work underscores the need to explicitly address state space uncertainty in morphological phylogenetics. As morphology remains crucial for reconstructing deep-time lineages and integrating fossils, accounting for SSM is essential to improving the reliability of evolutionary trees.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Systematic Biology
121 papers in training set
Top 0.1%
53.5%
50% of probability mass above
2
Methods in Ecology and Evolution
160 papers in training set
Top 0.3%
10.7%
3
Molecular Biology and Evolution
488 papers in training set
Top 2%
3.2%
4
BMC Ecology and Evolution
49 papers in training set
Top 0.6%
2.8%
5
Molecular Phylogenetics and Evolution
61 papers in training set
Top 0.1%
2.4%
6
Bulletin of Mathematical Biology
84 papers in training set
Top 1%
1.7%
7
Evolution
199 papers in training set
Top 1%
1.5%
8
PLOS Computational Biology
1633 papers in training set
Top 17%
1.5%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
10
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 5%
1.4%
11
Journal of Theoretical Biology
144 papers in training set
Top 1%
1.4%
12
PLOS Biology
408 papers in training set
Top 15%
1.0%
13
Molecular Ecology Resources
161 papers in training set
Top 0.9%
0.9%
14
PLOS ONE
4510 papers in training set
Top 63%
0.9%
15
Journal of Computational Biology
37 papers in training set
Top 0.5%
0.8%
16
Applications in Plant Sciences
21 papers in training set
Top 0.3%
0.8%
17
Bioinformatics
1061 papers in training set
Top 10%
0.7%
18
Journal of Evolutionary Biology
98 papers in training set
Top 1.0%
0.7%
19
PeerJ
261 papers in training set
Top 17%
0.7%
20
Scientific Reports
3102 papers in training set
Top 78%
0.7%
21
eLife
5422 papers in training set
Top 60%
0.7%
22
American Journal of Botany
41 papers in training set
Top 0.4%
0.7%
23
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.5%
24
Journal of Molecular Evolution
21 papers in training set
Top 0.5%
0.5%
25
Developmental Biology
134 papers in training set
Top 3%
0.5%
26
New Phytologist
309 papers in training set
Top 5%
0.5%
27
Systematic Entomology
11 papers in training set
Top 0.1%
0.5%