Back

Pro-GAT: Reconnecting Fragmented PROTACs Using Graph Attention Transformer

Vemuri, S.; Bijigiri, L. P.; Gogte, S.; Kondaparthi, V.

2026-02-23 bioinformatics
10.64898/2026.02.22.707266 bioRxiv
Show abstract

PROTACs work by bringing together a protein-of-interest ligand and an E3 ligase recruiter to trigger targeted degradation. However, Diffusion-based generative models frequently produce chemically invalid or disconnected linker structures that satisfy global geometric constraints but violate local bonding requirements. These models operate in continuous coordinate space and therefore lack explicit mechanisms for enforcing discrete chemical connectivity under fixed-anchor constraints. Invalid, disconnected outputs recur rather than being a rare exception, such that naive resampling is not an effective method to obtain valid chimeras. Pro-GAT is a graph attention-based framework for geometry-preserving molecular graph repair, capable of functioning on chemically disconnected diffusion-generated PROTAC candidates by predicting bounded coordinate corrections and constrained atom-type modifications using geometry-aware graph attention network (GAT) layers. The proposed model is trained on PROTAC datasets with added disconnections to overcome systematic connectivity failures in diffusion-based PROTAC generation with fixed anchors. When combined with DiffPROTACs and DiffLinker, Pro-GAT improves the percentage of chemically valid candidates in the aggregated output from 76.70% to 83.92% and 63.16% to 68.73% while maintaining 80.18% and 63.80% uniqueness levels of valid candidates respectively, thus facilitating the generation of usable PROTAC candidates from invalid diffusion samples. Pro-GAT was used in a case study of the 7Z76 ternary complex to repair DiffPROTACs and DiffLinker generated samples, which gave rise to connected chimeras whose docking scores were comparable to the original 7Z76 structure.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.8%
26.7%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
15.2%
3
Journal of Cheminformatics
25 papers in training set
Top 0.1%
10.4%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 24%
7.4%
5
Bioinformatics Advances
184 papers in training set
Top 0.6%
5.0%
6
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.2%
3.7%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
3.7%
8
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.2%
9
Nucleic Acids Research
1128 papers in training set
Top 10%
1.8%
10
Journal of Molecular Biology
217 papers in training set
Top 1%
1.7%
11
PLOS Computational Biology
1633 papers in training set
Top 15%
1.7%
12
BMC Bioinformatics
383 papers in training set
Top 6%
1.1%
13
Nature Methods
336 papers in training set
Top 5%
1.0%
14
PLOS ONE
4510 papers in training set
Top 62%
1.0%
15
Scientific Reports
3102 papers in training set
Top 69%
1.0%
16
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.7%
0.9%
17
Cell Reports Methods
141 papers in training set
Top 4%
0.8%
18
Chemical Science
71 papers in training set
Top 2%
0.8%
19
Protein Science
221 papers in training set
Top 2%
0.8%
20
Structure
175 papers in training set
Top 3%
0.8%
21
eLife
5422 papers in training set
Top 60%
0.7%
22
Nature Computational Science
50 papers in training set
Top 2%
0.7%
23
Advanced Science
249 papers in training set
Top 23%
0.5%