Back

Bayesian semi-nonnegative matrix tri-factorization to identify pathways associated with cancer phenotypes

Park, S.; Kar, N.; Cheong, J.-H.; Hwang, T. H.

2019-08-20 bioinformatics
10.1101/739110 bioRxiv
Show abstract

Accurate identification of pathways associated with cancer phenotypes (e.g., cancer sub-types and treatment outcome) could lead to discovering reliable prognostic and/or predictive biomarkers for better patients stratification and treatment guidance. In our previous work, we have shown that non-negative matrix tri-factorization (NMTF) can be successfully applied to identify pathways associated with specific cancer types or disease classes as a prognostic and predictive biomarker. However, one key limitation of non-negative factorization methods, including various non-negative bi-factorization methods, is their lack of ability to handle non-negative input data. For example, many molecular data that consist of real-values containing both positive and negative values (e.g., normalized/log transformed gene expression data where negative value represents down-regulated expression of genes) are not suitable input for these algorithms. In addition, most previous methods provide just a single point estimate and hence cannot deal with uncertainty effectively.\n\nTo address these limitations, we propose a Bayesian semi-nonnegative matrix trifactorization method to identify pathways associated with cancer phenotypes from a realvalued input matrix, e.g., gene expression values. Motivated by semi-nonnegative factorization, we allow one of the factor matrices, the centroid matrix, to be real-valued so that each centroid can express either the up- or down-regulation of the member genes in a pathway. In addition, we place structured spike-and-slab priors (which are encoded with the pathways and a gene-gene interaction (GGI) network) on the centroid matrix so that even a set of genes that is not initially contained in the pathways (due to the incompleteness of the current pathway database) can be involved in the factorization in a stochastic way specifically, if those genes are connected to the member genes of the pathways on the GGI network. We also present update rules for the posterior distributions in the framework of variational inference. As a full Bayesian method, our proposed method has several advantages over the current NMTF methods which are demonstrated using synthetic datasets in experiments. Using the The Cancer Genome Atlas (TCGA) gastric cancer and metastatic gastric cancer immunotherapy clinical-trial datasets, we show that our method could identify biologically and clinically relevant pathways associated with the molecular sub-types and immunotherapy response, respectively. Finally, we show that those pathways identified by the proposed method could be used as prognostic biomarkers to stratify patients with distinct survival outcome in two independent validation datasets. Additional information and codes can be found at https://github.com/parks-cs-ccf/BayesianSNMTF.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
23.7%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.6%
3
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
5.1%
4
Biometrics
22 papers in training set
Top 0.1%
5.1%
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.2%
6
Frontiers in Genetics
197 papers in training set
Top 2%
3.4%
50% of probability mass above
7
PLOS ONE
4510 papers in training set
Top 42%
3.2%
8
NeuroImage
813 papers in training set
Top 3%
2.2%
9
Medical Image Analysis
33 papers in training set
Top 0.5%
2.2%
10
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
11
Scientific Reports
3102 papers in training set
Top 56%
1.8%
12
Human Brain Mapping
295 papers in training set
Top 3%
1.6%
13
Nature Communications
4913 papers in training set
Top 53%
1.6%
14
Statistics in Medicine
34 papers in training set
Top 0.2%
1.6%
15
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.4%
16
Biostatistics
21 papers in training set
Top 0.1%
1.4%
17
Communications Biology
886 papers in training set
Top 12%
1.4%
18
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
19
Journal of Computational Biology
37 papers in training set
Top 0.3%
1.3%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
21
Patterns
70 papers in training set
Top 2%
0.8%
22
iScience
1063 papers in training set
Top 30%
0.8%
23
PLOS Genetics
756 papers in training set
Top 14%
0.8%
24
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
25
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.9%
0.5%
26
Quantitative Biology
11 papers in training set
Top 1.0%
0.5%
27
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.5%
28
Frontiers in Immunology
586 papers in training set
Top 9%
0.5%
29
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.5%