On why and how to encode probability distributions on graph representations of omics data: enhancing predictive tasks and knowledge discovery

Goncalves, D. M.; Patricio, A.; Costa, R. S.; Henriques, R.

2026-02-19 bioinformatics

10.64898/2026.02.19.706756 bioRxiv

Show abstract

The growing availability and complexity of omics data have driven the development of specialized algorithms for modeling molecular systems. Although graph-based learning methods effectively represent biological interactions, they often neglect the statistical information embedded in node and edge annotations. To address this limitation, we propose a novel graph-based framework that integrates structured statistical distributions into nodes and edges, capturing probabilistic characteristics of molecular relationships. We evaluate the proposed approach on omics datasets from five cancer types across multiple clinical outcomes, including survivability and primary tumor site. Results demonstrate predictive performance comparable to established machine learning baselines. Beyond prediction, the statistically enriched graph representations enable the identification and characterization of regulatory modules associated with clinical outcomes, enhancing biological interpretability. These findings suggest that incorporating structured statistical information into graph representations provides a competitive and interpretable framework for predictive modeling and knowledge discovery in complex diseases.

On why and how to encode probability distributions on graph representations of omics data: enhancing predictive tasks and knowledge discovery

Matching journals