Back

High-dimensional Bayesian phenotype classification and model selection using genomic predictors

Linder, D. F.; Panchal, V.

2019-09-23 bioinformatics
10.1101/778472 bioRxiv
Show abstract

MotivationIn this paper we describe a Bayesian hierarchical model termed PMMLogit for classification and model selection in high-dimensional settings with binary phenotypes as outcomes. Posterior computation in the logistic model is known to be computationally demanding due to its non-conjugacy with common priors. We combine a Polya-Gamma based data augmentation strategy and use recent results on Markov chain Monte-Carlo (MCMC) techniques to develop an efficient and exact sampling strategy for the posterior computation. We use the resulting MCMC chain for model selection and choose the best combination(s) of genomic variables via posterior model probabilities. Further, a Bayesian model averaging (BMA) approach using the posterior mean, which averages across visited models, is shown to give superior prediction of phenotypes given genomic measurements.\n\nResultsUsing simulation studies, we compared the performance of the proposed method with other popular methods. Simulation results show that the proposed method is quite effective in selecting the true model and has better estimation and prediction accuracy than other methods. These observations are consistent with theoretical results that have been developed in the statistics literature on optimality for this class of priors. Application to two well-known datasets on colon cancer and leukemia identified genes that have been previously reported in the clinical literature to be related to the disease outcomes.\n\nAvailabilitySource code is publicly available on GitHub at https://github.com/v-panchal/PMML.\n\nContactdlinder@augusta.edu\n\nSupplementary informationSupplementary data are available online.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.3%
42.6%
2
BMC Bioinformatics
383 papers in training set
Top 1%
7.7%
50% of probability mass above
3
PLOS Computational Biology
1633 papers in training set
Top 4%
7.3%
4
Statistics in Medicine
34 papers in training set
Top 0.1%
6.8%
5
Biometrics
22 papers in training set
Top 0.1%
2.9%
6
PLOS ONE
4510 papers in training set
Top 43%
2.9%
7
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
2.8%
8
Biostatistics
21 papers in training set
Top 0.1%
2.5%
9
Frontiers in Genetics
197 papers in training set
Top 4%
2.0%
10
BioData Mining
15 papers in training set
Top 0.3%
1.8%
11
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
12
Journal of Computational Biology
37 papers in training set
Top 0.5%
0.8%
13
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.7%
0.8%
14
Scientific Reports
3102 papers in training set
Top 74%
0.8%
15
JAMIA Open
37 papers in training set
Top 1%
0.8%
16
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.8%
17
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.9%
0.8%
18
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.8%
19
Interface Focus
14 papers in training set
Top 0.3%
0.8%
20
Medical Decision Making
10 papers in training set
Top 0.3%
0.7%
21
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
12 papers in training set
Top 0.1%
0.7%
22
American Journal of Epidemiology
57 papers in training set
Top 2%
0.7%
23
Imaging Neuroscience
242 papers in training set
Top 4%
0.7%
24
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.5%
25
PLOS Genetics
756 papers in training set
Top 18%
0.5%