Back

BionetBF: A Novel Bloom Filter for Faster Membership Identification of Paired Biological Network Data

Nayak, S.; Patgiri, R.

2021-09-24 bioinformatics
10.1101/2021.09.23.461527 bioRxiv
Show abstract

Big Graph is a graph having thousands of vertices and hundreds of thousands of edges. The study of graphs is crucial because the interlinkage among the vertices provides various insights and uncovers the hidden truth developed due to their relationship. The graph processing has non-linear time complexity. The overwhelming number of vertices and edges of Big Graph further enhances the processing complexity by many folds. One of the significant challenges is searching for an edge in Big Graph. This article proposes a novel Bloom Filter to determine the existence of a relationship in Big Graph, specifically biological networks. In this article, we propose a novel Bloom Filter called Biological network Bloom Filter (BionetBF) for fast membership identification of the biological network edges or paired biological data. BionetBF is capable of executing millions of operations within a second while occupying a tiny main memory footprint. We have conducted rigorous experiments to prove the performance of BionetBF with large datasets. The experiment is performed using 12 synthetic datasets and three biological network datasets. It takes less than 8 sec for insertion and query of 40 million biological edges. It demonstrates higher performance while maintaining a 0.001 false positive probability. BionetBF is compared with other filters: Cuckoo Filter and Libbloom, where small-sized BionetBF proves its supremacy by exhibiting higher performance compared with large-sized Cuckoo Filter and Libbloom. The source code is available at https://github.com/patgiri/BionetBF. The code is written in the C programming language. All data are available at the given link. HighlightsO_LIProposed a novel Bloom Filter, BionetBF, for faster boolean query on Big Graph. C_LIO_LIBionetBF has a low memory footprint and the lowest false positive probability. C_LIO_LIIt has high performance with constant searching time complexity. C_LIO_LIBionetBF has the potential to application in Big Graph, de-Bruijn Graph, and Drug Discovery. C_LI

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
23.6%
2
BMC Bioinformatics
383 papers in training set
Top 0.6%
12.9%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.6%
7.5%
4
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
7.5%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 43%
3.0%
6
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.6%
2.9%
7
Journal of Computational Biology
37 papers in training set
Top 0.1%
2.7%
8
Bioinformatics Advances
184 papers in training set
Top 2%
2.5%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
10
Frontiers in Genetics
197 papers in training set
Top 4%
2.0%
11
Scientific Reports
3102 papers in training set
Top 54%
1.9%
12
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
13
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.8%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.7%
15
GigaScience
172 papers in training set
Top 2%
1.6%
16
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.4%
17
PeerJ
261 papers in training set
Top 11%
1.0%
18
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
19
Patterns
70 papers in training set
Top 2%
0.9%
20
Computational Biology and Chemistry
23 papers in training set
Top 0.4%
0.8%
21
IEEE Access
31 papers in training set
Top 0.9%
0.8%
22
Neurocomputing
13 papers in training set
Top 0.5%
0.8%
23
BioData Mining
15 papers in training set
Top 0.8%
0.8%
24
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
25
Neuroinformatics
40 papers in training set
Top 1%
0.7%
26
Life
27 papers in training set
Top 0.7%
0.5%
27
Quantitative Biology
11 papers in training set
Top 1.0%
0.5%
28
iScience
1063 papers in training set
Top 39%
0.5%