Back

Bayesian statistics improves biological interpretability of metabolomics data from human cohorts

Brydges, C.; Che, X.; Lipkin, W. I.; Fiehn, O.

2022-05-19 bioinformatics
10.1101/2022.05.17.492312 bioRxiv
Show abstract

BackgroundUnivariate analyses of metabolomics data currently follow a frequentist approach, using p-values to reject a null-hypothesis. However, the usability of p-values is plagued by many misconceptions and inherent pitfalls. We here propose the use of Bayesian statistics to quantify evidence supporting different hypotheses and discriminate between the null hypothesis versus lack of statistical power. MethodsWe use metabolomics data from three independent human cohorts that studied plasma signatures of subjects with myalgic encephalomyelitis / chronic fatigue syndrome (ME/CFS). Data are publicly available, covering 84-197 subjects in each study with 562-888 identified metabolites of which 777 were common between two studies, and 93 compounds reported in all three studies. By comparing results from classic multiple regression against Bayesian multiple regression we show how Bayesian statistics incorporates results from one study as prior information into the next study, thereby improving the overall assessment of the likelihood of finding specific differences between plasma metabolite levels and disease outcomes in ME/CFS. ResultsWhereas using classic statistics and Benjamini-Hochberg FDR-corrections, study 1 detected 18 metabolic differences, study 2 detected no differences. Using Bayesian statistics on the same data, we found a high likelihood that 97 compounds were altered in concentration in study 2, after using the results of study 1 as prior distributions. These findings included lower levels of peroxisome-produced ether-lipids, higher levels of long chain, unsaturated triacylglycerides, and the presence of exposome compounds that are explained by difference in diet and medication between healthy subjects and ME/CFS patients. Although study 3 reported only 92 reported compounds in common with the other two studies, these major differences were confirmed. We also found that prostaglandin F2alpha, a lipid mediator of physiological relevance, was significantly reduced in ME/CFS patients across all three studies. ConclusionsThe use of Bayesian statistics led to biological conclusions from metabolomic data that were not found through the frequentist analytical approaches more commonly employed. We propose that Bayesian statistics to be highly useful for studies with similar research designs if similar metabolomic assays are used.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Metabolites
50 papers in training set
Top 0.1%
29.5%
2
BMC Bioinformatics
383 papers in training set
Top 0.2%
19.8%
3
Bioinformatics
1061 papers in training set
Top 4%
6.8%
50% of probability mass above
4
Scientific Reports
3102 papers in training set
Top 21%
5.2%
5
PLOS ONE
4510 papers in training set
Top 37%
3.8%
6
Bioinformatics Advances
184 papers in training set
Top 2%
2.2%
7
Metabolomics
11 papers in training set
Top 0.1%
2.2%
8
PLOS Computational Biology
1633 papers in training set
Top 13%
2.2%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
10
eLife
5422 papers in training set
Top 44%
1.6%
11
Nature Communications
4913 papers in training set
Top 58%
1.0%
12
PeerJ
261 papers in training set
Top 11%
1.0%
13
Wellcome Open Research
57 papers in training set
Top 1%
1.0%
14
Frontiers in Physiology
93 papers in training set
Top 5%
0.8%
15
Journal of Translational Medicine
46 papers in training set
Top 3%
0.7%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
17
Analytical Chemistry
205 papers in training set
Top 3%
0.7%
18
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 6%
0.7%
19
iScience
1063 papers in training set
Top 36%
0.7%
20
Pediatric Research
18 papers in training set
Top 0.5%
0.5%
21
Journal of Proteome Research
215 papers in training set
Top 3%
0.5%
22
eneuro
389 papers in training set
Top 10%
0.5%