Shape Dimensionality Metrics for Landmark Data

OKeefe, F. R.

2020-07-24 evolutionary biology

10.1101/2020.07.23.218289 bioRxiv

Show abstract

This paper is concerned with rank deficiency in phenotypic covariance matrices: first to establish it is a problem by measuring it, and then proposing methods to treat for it. Significant rank deficiency can mislead current measures of whole-shape phenotypic integration, because they rely on eigenvalues of the covariance matrix, and highly rank deficient matrices will have a large percentage of meaningless eigenvalues. This paper has three goals. The first is to examine a typical geometric morphometric data set and establish that its covariance matrix is rank deficient. We employ the concept of information, or Shannon, entropy to demonstrate that a sample of dire wolf jaws is highly rank deficient. The different sources of rank deficiency are identified, and include the Generalized Procrustes analysis itself, use of the correlation matrix, insufficient sample size, and phenotypic covariance. Only the last of these is of biological interest. Our second goal is to examine a test case where a change in integration is known, allowing us to document how rank deficiency affects two measures of whole shape integration (eigenvalue standard deviation and standardized generalized variance). This test case utilizes the dire wolf data set from Part 1, and introduces another population that is 5000 years older. Modularity models are generated and tested for both populations, showing that one population is more integrated than the other. We demonstrate that eigenvalue variance characterizes the integration change incorrectly, while the standardized generalized variance lacks sensitivity. Both metrics are impacted by the inclusion of many small eigenvalues arising from rank deficiency of the covariance matrix. We propose a modification of the standardized generalized variance, again based on information entropy, that considers only the eigenvalues carrying non-redundant information. We demonstrate that this metric is successful in identifying the integration change in the test case. The third goal of this paper is to generalize the new metric to the case of arbitrary sample size. This is done by normalizing the new metric to the amount of information present in a permuted covariance matrix. We term the resulting metric the relative dispersion, and it is sample size corrected. As a proof of concept we us the new metric to compare the dire wolf data set from the first part of this paper to a third data set comprising jaws of Smilodon fatalis. We demonstrate that the Smilodon jaw is much more integrated than the dire wolf jaw. Finally, this information entropy-based measures of integration allows comparison of whole shape integration in dense semilandmark environments, allowing characterization of the information content of any given shape, a quantity we term latent dispersion.

Shape Dimensionality Metrics for Landmark Data

Matching journals