Back

VaLPAS: Leveraging variation in experimental multi-omics data to elucidate protein function

Mahlich, Y.; Ross, D. H.; Monteiro, L.; McDermott, J. E.

2026-03-30 bioinformatics
10.64898/2026.03.26.712966 bioRxiv
Show abstract

MotivationDespite continuing advances in sequencing and computational function determination, large parts of the studied gene, protein, and metabolite space remain functionally undetermined. Most function assignment is driven by homology searches and annotation transfer from known and extensively studied proteins but often fails to leverage available experimental omics data generated via technologies like mass-spectrometry. ResultsThe VaLPAS (Variation-Leveraged Phenomic Association Screen) framework is available as a Python package and provides a user-friendly platform for calculation of associations between expression patterns of genes or proteins in multi-omic datasets based on various statistical and learning methods. The goal of this approach is to shed light on the functional dark matter of protein space by elucidating previously unknown functions of molecules using guilt by association with molecules of known function. We present results demonstrating the utility of VaLPAS to identify high-confidence predictions for a subset of genes/proteins of unknown function in a previously published multi-omics dataset from the oleaginous yeast, Rhodotorula toruloides. AvailabilityVaLPAS is written in Python. The code is hosted on github (https://github.com/PNNL-Predictive-Phenomics/valpas/).

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
26.1%
2
Bioinformatics Advances
184 papers in training set
Top 0.1%
18.8%
3
BMC Bioinformatics
383 papers in training set
Top 0.3%
18.8%
50% of probability mass above
4
Journal of Proteome Research
215 papers in training set
Top 0.4%
7.2%
5
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.5%
4.2%
6
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
8
Nucleic Acids Research
1128 papers in training set
Top 15%
1.0%
9
GigaScience
172 papers in training set
Top 2%
0.9%
10
PLOS ONE
4510 papers in training set
Top 64%
0.9%
11
Molecular & Cellular Proteomics
158 papers in training set
Top 2%
0.9%
12
Scientific Reports
3102 papers in training set
Top 74%
0.8%
13
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
14
Frontiers in Microbiology
375 papers in training set
Top 9%
0.7%
15
Analytical Chemistry
205 papers in training set
Top 3%
0.7%
16
mSystems
361 papers in training set
Top 8%
0.7%
17
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.6%
18
Synthetic Biology
21 papers in training set
Top 0.2%
0.6%
19
Journal of Molecular Biology
217 papers in training set
Top 4%
0.6%
20
Biochimica et Biophysica Acta (BBA) - Bioenergetics
17 papers in training set
Top 0.2%
0.6%
21
Open Biology
95 papers in training set
Top 3%
0.6%
22
BMC Genomics
328 papers in training set
Top 8%
0.5%
23
Genome Biology
555 papers in training set
Top 9%
0.5%