Back

Approximate k-nearest neighbors graph for single-cell Hi-C dimensional reduction with MinHash

Wolff, J.; Backofen, R.; Gruening, B.

2020-03-05 bioinformatics
10.1101/2020.03.05.978569 bioRxiv
Show abstract

Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices they are flattened and compiled into one matrix. This matrix can, depending on the resolution, have a few millions or even billions of features and any computation with it is therefore memory demanding. A common approach to reduce the number of features is to compute a nearest neighbors graph. However, the exact euclidean distance computation is in O(n2) and therefore we present an implementation of an approximate nearest neighbors method based on local sensitive hashing running in O(n). The presented method is able to process a 10kb single-cell Hi-C data set with 2500 cells and needs 53 GB of memory while the exact k-nearest neighbors approach is not computable with 1 TB of memory.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.3%
2
PLOS ONE
4510 papers in training set
Top 13%
14.4%
3
Algorithms for Molecular Biology
15 papers in training set
Top 0.1%
6.8%
4
Nature Communications
4913 papers in training set
Top 26%
6.8%
5
iScience
1063 papers in training set
Top 2%
4.9%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 24%
4.9%
7
Peer Community Journal
254 papers in training set
Top 0.6%
4.3%
8
Genome Research
409 papers in training set
Top 0.8%
4.0%
9
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
10
Genome Biology
555 papers in training set
Top 3%
2.4%
11
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
12
Communications Biology
886 papers in training set
Top 8%
1.7%
13
GigaScience
172 papers in training set
Top 1%
1.7%
14
Nature Methods
336 papers in training set
Top 5%
1.2%
15
Cell Systems
167 papers in training set
Top 10%
1.0%
16
Journal of Computational Biology
37 papers in training set
Top 0.4%
1.0%
17
Frontiers in Neuroinformatics
38 papers in training set
Top 0.6%
0.9%
18
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
19
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
20
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
21
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
22
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.7%
0.7%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
24
Journal of Open Source Software
22 papers in training set
Top 0.3%
0.6%
25
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.7%
0.6%