High-dimensional Bayesian phenotype classification and model selection using genomic predictors

Linder, D. F.; Panchal, V.

2019-09-23 bioinformatics

Show abstract

MotivationIn this paper we describe a Bayesian hierarchical model termed PMMLogit for classification and model selection in high-dimensional settings with binary phenotypes as outcomes. Posterior computation in the logistic model is known to be computationally demanding due to its non-conjugacy with common priors. We combine a Polya-Gamma based data augmentation strategy and use recent results on Markov chain Monte-Carlo (MCMC) techniques to develop an efficient and exact sampling strategy for the posterior computation. We use the resulting MCMC chain for model selection and choose the best combination(s) of genomic variables via posterior model probabilities. Further, a Bayesian model averaging (BMA) approach using the posterior mean, which averages across visited models, is shown to give superior prediction of phenotypes given genomic measurements.\n\nResultsUsing simulation studies, we compared the performance of the proposed method with other popular methods. Simulation results show that the proposed method is quite effective in selecting the true model and has better estimation and prediction accuracy than other methods. These observations are consistent with theoretical results that have been developed in the statistics literature on optimality for this class of priors. Application to two well-known datasets on colon cancer and leukemia identified genes that have been previously reported in the clinical literature to be related to the disease outcomes.\n\nAvailabilitySource code is publicly available on GitHub at https://github.com/v-panchal/PMML.\n\nContactdlinder@augusta.edu\n\nSupplementary informationSupplementary data are available online.

High-dimensional Bayesian phenotype classification and model selection using genomic predictors

Matching journals