Approximate k-nearest neighbors graph for single-cell Hi-C dimensional reduction with MinHash
Wolff, J.; Backofen, R.; Gruening, B.
Show abstract
Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices they are flattened and compiled into one matrix. This matrix can, depending on the resolution, have a few millions or even billions of features and any computation with it is therefore memory demanding. A common approach to reduce the number of features is to compute a nearest neighbors graph. However, the exact euclidean distance computation is in O(n2) and therefore we present an implementation of an approximate nearest neighbors method based on local sensitive hashing running in O(n). The presented method is able to process a 10kb single-cell Hi-C data set with 2500 cells and needs 53 GB of memory while the exact k-nearest neighbors approach is not computable with 1 TB of memory.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.