Back

Shape Dimensionality Metrics for Landmark Data

OKeefe, F. R.

2020-07-24 evolutionary biology
10.1101/2020.07.23.218289 bioRxiv
Show abstract

This paper is concerned with rank deficiency in phenotypic covariance matrices: first to establish it is a problem by measuring it, and then proposing methods to treat for it. Significant rank deficiency can mislead current measures of whole-shape phenotypic integration, because they rely on eigenvalues of the covariance matrix, and highly rank deficient matrices will have a large percentage of meaningless eigenvalues. This paper has three goals. The first is to examine a typical geometric morphometric data set and establish that its covariance matrix is rank deficient. We employ the concept of information, or Shannon, entropy to demonstrate that a sample of dire wolf jaws is highly rank deficient. The different sources of rank deficiency are identified, and include the Generalized Procrustes analysis itself, use of the correlation matrix, insufficient sample size, and phenotypic covariance. Only the last of these is of biological interest. Our second goal is to examine a test case where a change in integration is known, allowing us to document how rank deficiency affects two measures of whole shape integration (eigenvalue standard deviation and standardized generalized variance). This test case utilizes the dire wolf data set from Part 1, and introduces another population that is 5000 years older. Modularity models are generated and tested for both populations, showing that one population is more integrated than the other. We demonstrate that eigenvalue variance characterizes the integration change incorrectly, while the standardized generalized variance lacks sensitivity. Both metrics are impacted by the inclusion of many small eigenvalues arising from rank deficiency of the covariance matrix. We propose a modification of the standardized generalized variance, again based on information entropy, that considers only the eigenvalues carrying non-redundant information. We demonstrate that this metric is successful in identifying the integration change in the test case. The third goal of this paper is to generalize the new metric to the case of arbitrary sample size. This is done by normalizing the new metric to the amount of information present in a permuted covariance matrix. We term the resulting metric the relative dispersion, and it is sample size corrected. As a proof of concept we us the new metric to compare the dire wolf data set from the first part of this paper to a third data set comprising jaws of Smilodon fatalis. We demonstrate that the Smilodon jaw is much more integrated than the dire wolf jaw. Finally, this information entropy-based measures of integration allows comparison of whole shape integration in dense semilandmark environments, allowing characterization of the information content of any given shape, a quantity we term latent dispersion.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Methods in Ecology and Evolution
160 papers in training set
Top 0.1%
23.1%
2
BMC Ecology and Evolution
49 papers in training set
Top 0.1%
7.0%
3
Ecology and Evolution
232 papers in training set
Top 0.5%
5.0%
4
Evolution
199 papers in training set
Top 0.6%
5.0%
5
Evolutionary Biology
10 papers in training set
Top 0.1%
5.0%
6
Bioinformatics
1061 papers in training set
Top 5%
3.7%
7
Peer Community Journal
254 papers in training set
Top 1%
2.9%
50% of probability mass above
8
Ecological Informatics
29 papers in training set
Top 0.2%
2.9%
9
PLOS ONE
4510 papers in training set
Top 47%
2.1%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
11
Journal of Evolutionary Biology
98 papers in training set
Top 0.4%
2.1%
12
Scientific Reports
3102 papers in training set
Top 55%
1.8%
13
Behavioral Ecology and Sociobiology
34 papers in training set
Top 0.3%
1.8%
14
Journal of Computational Biology
37 papers in training set
Top 0.1%
1.8%
15
Journal of The Royal Society Interface
189 papers in training set
Top 2%
1.7%
16
Royal Society Open Science
193 papers in training set
Top 2%
1.7%
17
The American Naturalist
114 papers in training set
Top 1%
1.7%
18
Biology Open
130 papers in training set
Top 2%
1.3%
19
Ecology Letters
121 papers in training set
Top 1.0%
1.3%
20
Systematic Biology
121 papers in training set
Top 0.3%
1.3%
21
Molecular Ecology Resources
161 papers in training set
Top 0.9%
0.9%
22
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
23
Journal of Theoretical Biology
144 papers in training set
Top 1%
0.9%
24
Ecography
50 papers in training set
Top 1%
0.8%
25
Developmental Biology
134 papers in training set
Top 2%
0.7%
26
Genetics
225 papers in training set
Top 4%
0.7%
27
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 7%
0.7%
28
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.7%
29
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.7%
30
GENETICS
189 papers in training set
Top 2%
0.7%