Back

Sparsification of Large Ultrametric Matrices: Insights into the Microbial Tree of Life

Gorman, E. D.; Lladser, M. E.

2022-11-21 microbiology
10.1101/2022.08.21.504697 bioRxiv
Show abstract

Strictly ultrametric matrices appear in many domains of mathematics and science; nevertheless, they can be large and dense, making them difficult to store and manipulate, unlike large but sparse matrices. In this manuscript, we exploit that strictly ultrametric matrices can be represented as binary trees to sparsify them via an orthonormal base change based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large strictly ultrametric matrices remain non-zero after the base change; and develop an algorithm to sparsify such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the Haar-like wavelets and supply a sufficient condition to approximate the spectrum of strictly ultrametric matrices outside this subclass. Our methods give computational access to the covariance matrix of the microbiologists Tree of Life, which was previously inaccessible due to its size, and motivate introducing a new wavelet-based (beta-diversity) metric to compare microbial environments. Unlike the established (beta-diversity) metrics, the new metric may be used to identify internal nodes (i.e., splits) in the Tree that link microbial composition and environmental factors in a statistically significant manner. MSC codes05C05, 15A18, 42C40, 65F55, 92C70

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of The Royal Society Interface
189 papers in training set
Top 0.1%
26.3%
2
Bulletin of Mathematical Biology
84 papers in training set
Top 0.1%
22.8%
3
PLOS Computational Biology
1633 papers in training set
Top 2%
12.7%
50% of probability mass above
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.5%
5
Ecology Letters
121 papers in training set
Top 0.5%
2.8%
6
Theoretical Population Biology
47 papers in training set
Top 0.1%
2.1%
7
Cell Systems
167 papers in training set
Top 6%
1.9%
8
Physical Review Research
46 papers in training set
Top 0.3%
1.9%
9
Physical Review E
95 papers in training set
Top 0.6%
1.8%
10
mSystems
361 papers in training set
Top 5%
1.7%
11
Journal of Theoretical Biology
144 papers in training set
Top 0.9%
1.7%
12
Nature Communications
4913 papers in training set
Top 55%
1.3%
13
Scientific Reports
3102 papers in training set
Top 72%
0.8%
14
Mathematics
11 papers in training set
Top 0.4%
0.8%
15
Frontiers in Microbiology
375 papers in training set
Top 9%
0.8%
16
PLOS ONE
4510 papers in training set
Top 67%
0.8%
17
PNAS Nexus
147 papers in training set
Top 2%
0.7%
18
Genetics
225 papers in training set
Top 5%
0.7%
19
Bioinformatics
1061 papers in training set
Top 10%
0.7%
20
Emerging Infectious Diseases
103 papers in training set
Top 4%
0.5%
21
Science
429 papers in training set
Top 22%
0.5%
22
Cell Reports
1338 papers in training set
Top 36%
0.5%