Back

Joint linear modeling of transcriptomics and proteomics is predictive of cancer metastasis

Sharma, R.; Meimetis, N.; Begzati, A.; Nagar, S. D.; Kellman, B.; Baghdassarian, H. M.

2025-02-20 systems biology
10.1101/2025.02.15.638428 bioRxiv
Show abstract

A central goal of conducting omics measurements is to understand how molecular features inform higher-order cell- and tissue-level phenotypes. In particular, multi-omics offers insights into how information encoded by the genome is coordinated through biological layers, resulting in functional outputs1. Due to myriad post-transcriptional regulatory processes, the coordination between mRNA and protein cannot be simply reduced to gene-wise correlation. Yet, both modalities have been shown to serve as representations of biological state, and multi-omics integration has been used to improve these representations. Multi-omics approaches typically do not focus on how mRNA and protein features coordinate, but rather use the additional information for improved prediction or feature selection. Here, instead, we showed that standard linear machine learning models provide an understanding of transcriptomic and proteomic coordination in the context of a biological phenotype of interest, in this case cancer metastasis. We find that, in the context of metastasis, a select subset of proteomic features--reflecting a more concentrated signal relative to the broadly distributed transcriptomic signal--offers additional information to that encoded by transcriptomics, as demonstrated by improved model performance when integrating the two modalities and the relative feature importance of proteomics. Top features show a depletion of gene-product overlap across modalities, indicating that the model primarily leverages instances in which the two modalities are providing complementary information with respect to phenotype. However, in instances when both modalities are selected for a given gene product, there is high information consistency that synergistically bolsters phenotype prediction. Altogether, by using model fits that relate both modalities to phenotype, we observe a nuanced coordination of protein and mRNA, in which both modalities tend to provide consistent information about phenotype, yet benefits remain to incorporating a combination of both complementary and reinforcing signals across modalities.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
23.5%
2
Molecular Systems Biology
142 papers in training set
Top 0.1%
19.5%
3
Nature Communications
4913 papers in training set
Top 16%
10.9%
50% of probability mass above
4
iScience
1063 papers in training set
Top 1%
6.7%
5
npj Systems Biology and Applications
99 papers in training set
Top 0.3%
5.1%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.9%
7
Bioinformatics
1061 papers in training set
Top 6%
2.7%
8
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
9
Nature Machine Intelligence
61 papers in training set
Top 2%
1.8%
10
Cell Reports
1338 papers in training set
Top 23%
1.7%
11
eLife
5422 papers in training set
Top 48%
1.3%
12
Nature Methods
336 papers in training set
Top 5%
1.3%
13
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
14
Scientific Reports
3102 papers in training set
Top 72%
0.8%
15
Nature
575 papers in training set
Top 14%
0.8%
16
Nature Neuroscience
216 papers in training set
Top 6%
0.8%
17
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
18
Genome Biology
555 papers in training set
Top 7%
0.8%
19
Communications Biology
886 papers in training set
Top 22%
0.8%
20
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
21
Cancer Cell
38 papers in training set
Top 2%
0.8%
22
Science Advances
1098 papers in training set
Top 29%
0.8%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.5%
24
Patterns
70 papers in training set
Top 3%
0.5%
25
Heliyon
146 papers in training set
Top 9%
0.5%
26
Cell Genomics
162 papers in training set
Top 8%
0.5%