Back

Estimating mutual information under measurement error

Ma, C.; Kingsford, C.

2019-11-23 bioinformatics
10.1101/852384 bioRxiv
Show abstract

Mutual information is widely used to characterize dependence between biological signals, such as co-expression between genes or co-evolution between amino acids. However, measurement error of the biological signals is rarely considered in estimating mutual information. Measurement error is widespread and non-negligible in some cases. As a result, the distribution of the signals is blurred, and the mutual information may be biased when estimated using the blurred measurements. We derive a corrected estimator for mutual information that accounts for the distribution of measurement error. Our corrected estimator is based on the correction of the probability mass function (PMF) or probability density function (PDF, based on kernel density estimation). We prove that the corrected estimator is asymptotically unbiased in the (semi-) discrete case when the distribution of measurement error is known. We show that it reduces the estimation bias in the continuous case under certain assumptions. On simulated data, our corrected estimator leads to a more accurate estimation for mutual information when the sample size is not the limiting factor for estimating PMF or PDF accurately. We compare the uncorrected and corrected estimator on the gene expression data of TCGA breast cancer samples and show a difference in both the value and the ranking of estimated mutual information between the two estimators.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
18.6%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.1%
3
Biometrics
22 papers in training set
Top 0.1%
7.2%
4
Biostatistics
21 papers in training set
Top 0.1%
4.9%
5
PLOS ONE
4510 papers in training set
Top 34%
4.3%
6
BMC Bioinformatics
383 papers in training set
Top 2%
4.3%
7
Journal of Computational Biology
37 papers in training set
Top 0.1%
4.3%
50% of probability mass above
8
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
4.2%
9
Statistics in Medicine
34 papers in training set
Top 0.1%
4.0%
10
Physical Review E
95 papers in training set
Top 0.4%
3.1%
11
Scientific Reports
3102 papers in training set
Top 41%
3.1%
12
Biophysical Journal
545 papers in training set
Top 2%
2.1%
13
Physical Review Research
46 papers in training set
Top 0.3%
1.9%
14
Physical Biology
43 papers in training set
Top 1%
1.5%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
16
Entropy
20 papers in training set
Top 0.2%
1.2%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
18
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.9%
19
Nature Communications
4913 papers in training set
Top 59%
0.9%
20
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.6%
0.8%
21
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.8%
22
Journal of Mathematical Biology
37 papers in training set
Top 0.3%
0.7%
23
Brain Topography
23 papers in training set
Top 0.4%
0.7%
24
Neural Networks
32 papers in training set
Top 0.8%
0.7%
25
Theoretical Population Biology
47 papers in training set
Top 0.2%
0.7%
26
Interface Focus
14 papers in training set
Top 0.3%
0.7%
27
NeuroImage
813 papers in training set
Top 6%
0.6%
28
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.6%
29
Frontiers in Genetics
197 papers in training set
Top 11%
0.6%
30
Communications Biology
886 papers in training set
Top 29%
0.6%