BionetBF: A Novel Bloom Filter for Faster Membership Identification of Paired Biological Network Data

Nayak, S.; Patgiri, R.

2021-09-24 bioinformatics

10.1101/2021.09.23.461527 bioRxiv

Show abstract

Big Graph is a graph having thousands of vertices and hundreds of thousands of edges. The study of graphs is crucial because the interlinkage among the vertices provides various insights and uncovers the hidden truth developed due to their relationship. The graph processing has non-linear time complexity. The overwhelming number of vertices and edges of Big Graph further enhances the processing complexity by many folds. One of the significant challenges is searching for an edge in Big Graph. This article proposes a novel Bloom Filter to determine the existence of a relationship in Big Graph, specifically biological networks. In this article, we propose a novel Bloom Filter called Biological network Bloom Filter (BionetBF) for fast membership identification of the biological network edges or paired biological data. BionetBF is capable of executing millions of operations within a second while occupying a tiny main memory footprint. We have conducted rigorous experiments to prove the performance of BionetBF with large datasets. The experiment is performed using 12 synthetic datasets and three biological network datasets. It takes less than 8 sec for insertion and query of 40 million biological edges. It demonstrates higher performance while maintaining a 0.001 false positive probability. BionetBF is compared with other filters: Cuckoo Filter and Libbloom, where small-sized BionetBF proves its supremacy by exhibiting higher performance compared with large-sized Cuckoo Filter and Libbloom. The source code is available at https://github.com/patgiri/BionetBF. The code is written in the C programming language. All data are available at the given link. HighlightsO_LIProposed a novel Bloom Filter, BionetBF, for faster boolean query on Big Graph. C_LIO_LIBionetBF has a low memory footprint and the lowest false positive probability. C_LIO_LIIt has high performance with constant searching time complexity. C_LIO_LIBionetBF has the potential to application in Big Graph, de-Bruijn Graph, and Drug Discovery. C_LI

BionetBF: A Novel Bloom Filter for Faster Membership Identification of Paired Biological Network Data

Matching journals