Back

Corruption of the Pearson correlation coefficient by measurement error: estimation, bias, and correction under different error models

Saccenti, E.; Hendriks, M.; Smilde, A.

2019-06-14 bioinformatics
10.1101/671693 bioRxiv
Show abstract

Correlation coefficients are abundantly used in the life sciences. Their use can be limited to simple exploratory analysis or to construct association networks for visualization but they are also basic ingredients for sophisticated multivariate data analysis methods. It is therefore important to have reliable estimates for correlation coefficients. In modern life sciences, comprehensive measurement techniques are used to measure metabolites, proteins, gene-expressions and other types of data. All these measurement techniques have errors. Whereas in the old days, with simple measurements, the errors were also simple, that is not the case anymore. Errors are heterogeneous, non-constant and not independent. This hampers the quality of the estimated correlation coefficients seriously. We will discuss the different types of errors as present in modern comprehensive life science data and show with theory, simulations and real-life data how these affect the correlation coefficients. We will briefly discuss ways to improve the estimation of such coefficients.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 18%
10.4%
2
Physical Biology
43 papers in training set
Top 0.1%
7.4%
3
Scientific Reports
3102 papers in training set
Top 13%
7.0%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.5%
5
BioSystems
11 papers in training set
Top 0.1%
5.0%
6
PeerJ
261 papers in training set
Top 0.9%
5.0%
7
BMC Bioinformatics
383 papers in training set
Top 2%
4.1%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
1.9%
9
Entropy
20 papers in training set
Top 0.1%
1.9%
10
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
1.8%
50% of probability mass above
11
Bioinformatics
1061 papers in training set
Top 7%
1.7%
12
Chaos, Solitons & Fractals
32 papers in training set
Top 1.0%
1.7%
13
Vaccines
196 papers in training set
Top 1%
1.7%
14
Physical Review E
95 papers in training set
Top 0.6%
1.7%
15
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
1.7%
16
Biosystems
18 papers in training set
Top 0.2%
1.3%
17
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.3%
18
Frontiers in Neuroscience
223 papers in training set
Top 6%
1.0%
19
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 7%
1.0%
20
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.9%
21
SoftwareX
15 papers in training set
Top 0.4%
0.8%
22
The European Physical Journal Plus
13 papers in training set
Top 0.7%
0.8%
23
Frontiers in Physics
20 papers in training set
Top 0.8%
0.8%
24
Biology
43 papers in training set
Top 2%
0.8%
25
Statistics in Medicine
34 papers in training set
Top 0.3%
0.8%
26
Journal of Biosciences
12 papers in training set
Top 0.1%
0.8%
27
Journal of Computational Biology
37 papers in training set
Top 0.6%
0.8%
28
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.8%
29
Royal Society Open Science
193 papers in training set
Top 5%
0.8%
30
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
15 papers in training set
Top 0.8%
0.8%