Back

Bias in miRNA enrichment analysis related to gene functional annotations

Zagganas, K.; Georgakilas, G. K.; Vergoulis, T.; Dalamagas, T.

2021-08-17 bioinformatics
10.1101/2021.08.16.456527 bioRxiv
Show abstract

BackgroundmiRNA functional enrichment is a type of analysis that is used to predict which biological functions may be affected by a group of miRNAs or validate whether a list of dysregulated miRNAs are linked to a diseased state. The standard method for functional enrichment analysis uses the hypergeometric distribution to produce p-values, depicting the strength of the association between a group of miRNAs and a biological function. However, in 2015, it was shown that this approach suffers from a bias related to miRNA targets produced by target prediction algorithms and a new randomization test was proposed to alleviate this issue. ResultsWe demonstrate the existence of another previously unreported underlying bias which affects gene annotation data sets; additionally, we show that the statistical measure used for the established randomization test is not sensitive enough to account for it. In this context, we show that the use of Jaccard coefficient (an alternative statistical measure) is able to alleviate the aforementioned issue. ConclusionsIn this paper, we illustrate the existence of a new bias affecting the miRNA functional enrichment analysis. This bias makes Fishers exact test unsuitable for miRNA functional enrichment analyses and there is also a need to adjust the established unbiased test accordingly. We propose the use of a modified version of the established test and in order to facilitate its use, we introduce a novel unbiased miRNA enrichment analysis tool that implements the proposed method. At the same time, by leveraging bit vectors, our tool guarantees fast and scalable execution. AvailabilityAll datasets used in the experiments throughout this paper are openly accessible on Zenodo (https://doi.org/10.5281/zenodo.5175819).

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.