Back

DisGeneFormer: Precise Disease Gene Prioritization by Integrating Local and Global Graph Attention

Koeksal, R.; Fritz, A.; Kumar, A.; Schmidts, M.; Tran, V. D.; Backofen, R.

2026-03-14 bioinformatics
10.64898/2026.03.11.711106 bioRxiv
Show abstract

Identifying genes associated with human diseases is essential for effective diagnosis and treatment. Experimentally identifying disease-causing genes is time-consuming and expensive. Computational prioritization methods aim to streamline this process by ranking genes based on their likelihood of association with a given disease. However, existing methods often report long ranked lists consisting of thousands of potential disease genes, often containing a high number of false positives. This fails to meet the practical needs of clinicians who require shorter, more precise candidate lists. To address this problem, we introduce DisGeneFormer (DGF), an end-to-end disease-gene prioritization pipeline. Our approach is based on two distinct graph representations, modeling gene and disease relationships, respectively. Each graph is first processed separately by graph attention and then jointly by a transformer module to combine within-graph and cross-graph knowledge through local and global attention. We propose an evaluation pipeline based on the precision of a top K ranked gene list, with K set to clinically feasible values between 5 and 50, relying solely on experimentally verified associations as ground truth. Our evaluation demonstrates that DGF substantially outperforms existing methods. We additionally assessed the influence of the negative data sampling strategy as well as analyses of the effect of graph topology and features on the performance of our model.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.8%
2
Bioinformatics Advances
184 papers in training set
Top 0.2%
8.5%
3
Nucleic Acids Research
1128 papers in training set
Top 4%
4.4%
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.4%
5
Genome Research
409 papers in training set
Top 0.8%
4.0%
6
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
3.7%
7
Scientific Reports
3102 papers in training set
Top 35%
3.6%
50% of probability mass above
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.1%
9
PLOS ONE
4510 papers in training set
Top 45%
2.6%
10
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.7%
2.5%
11
Nature Communications
4913 papers in training set
Top 46%
2.1%
12
Cell Systems
167 papers in training set
Top 5%
2.1%
13
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.1%
15
Genome Medicine
154 papers in training set
Top 4%
1.8%
16
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
17
European Journal of Human Genetics
49 papers in training set
Top 0.7%
1.5%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
19
BioData Mining
15 papers in training set
Top 0.4%
1.3%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.2%
21
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.3%
1.2%
22
Genome Biology
555 papers in training set
Top 6%
1.1%
23
Journal of Biomedical Informatics
45 papers in training set
Top 1%
1.1%
24
BMC Medical Genomics
36 papers in training set
Top 0.9%
1.0%
25
Nature Methods
336 papers in training set
Top 5%
1.0%
26
iScience
1063 papers in training set
Top 26%
0.9%
27
Database
51 papers in training set
Top 1%
0.7%
28
Advanced Science
249 papers in training set
Top 20%
0.7%
29
Patterns
70 papers in training set
Top 3%
0.7%
30
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%