Back

gGN: learning to represent graph nodes as low-rank Gaussian distributions

Edera, A. A.; Stegmayer, G.; Milone, D. H.

2022-11-17 bioinformatics
10.1101/2022.11.15.516704 bioRxiv
Show abstract

Unsupervised learning of node representations from knowledge graphs is critical for numerous downstream tasks, ranging from large-scale graph analysis to measuring semantic similarity between nodes. This study presents gGN as a novel representation that defines graph nodes as Gaussian distributions. Unlike existing representations that approximate such distributions using diagonal covariance matrices, our proposal approximates them using low-rank perturbations. We demonstrate that this low-rank approximation is more expressive and better suited to represent complex asymmetric relations between nodes. In addition, we provide a computationally affordable algorithm for learning the low-rank representations in an unsupervised fashion. This learning algorithm uses a novel loss function based on the reverse Kullback-Leibler divergence and two ranking metrics whose joint minimization results in node representations that preserve not only node depths but also local and global asymmetric relationships between nodes. We assessed the representation power of the low-rank approximation with an in-depth systematic empirical study. The results show that our proposal was significantly better than the diagonal approximation for preserving graph structures. Moreover, gGN also outperformed 17 methods on the downstream task of measuring semantic similarity between graph nodes.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.8%
2
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
6.9%
3
PLOS ONE
4510 papers in training set
Top 27%
6.4%
4
Nature Communications
4913 papers in training set
Top 32%
4.9%
5
Scientific Reports
3102 papers in training set
Top 23%
4.9%
6
Journal of Computational Biology
37 papers in training set
Top 0.1%
4.0%
7
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
50% of probability mass above
8
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.5%
3.6%
9
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
10
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.6%
12
IEEE Access
31 papers in training set
Top 0.2%
2.5%
13
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
2.1%
14
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
15
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
17
European Journal of Human Genetics
49 papers in training set
Top 0.9%
1.1%
18
Cell Systems
167 papers in training set
Top 10%
1.0%
19
Advanced Science
249 papers in training set
Top 17%
0.8%
20
iScience
1063 papers in training set
Top 29%
0.8%
21
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
22
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.8%
23
Journal of Molecular Biology
217 papers in training set
Top 3%
0.8%
24
BioData Mining
15 papers in training set
Top 1.0%
0.7%
25
GigaScience
172 papers in training set
Top 4%
0.7%
26
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%
27
Algorithms for Molecular Biology
15 papers in training set
Top 0.1%
0.5%
28
Neurocomputing
13 papers in training set
Top 0.8%
0.5%
29
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.5%