Back

GraphTox: A Semi-Supervised Pre-Trained Framework for Peptide Toxicity Prediction using Geometric Graph Transformer and LORA-Based Finetuning

BHADURI, S.; Das, D.; MITRA, P.

2026-05-27 bioinformatics
10.64898/2026.05.23.727225 bioRxiv
Show abstract

Peptides are widely used as potential therapeutic agents in drug discovery and biotechnology because they are specific, effective, and relatively inexpensive to produce. They are used in drug development, vaccines, and antimicrobial treatments. However, peptide toxicity remains a major concern as it offers unwanted toxic consequences, such as membrane rupture, haemolysis, tissue damage and adverse immunological response. Early detection of toxic peptide candidates is vital for the development of safe and effective therapies. Current computational methods for predicting peptide toxicity are largely based on hand-crafted sequence descriptors or sequence-only deep learning architectures that may not fully account for the underlying 3-dimensional structural determinants of peptide toxicity. We introduce GraphTox, a structure-aware geometric deep learning framework which combines self-supervised graph representation learning with hierarchical structural modelling to accurately predict peptide toxicity. Our framework learns geometry-aware embeddings from peptide structural graphs via self-supervised masked residue reconstruction, based on a Masked Graph Autoencoder (MGAE) built on a Geometric Graph Transformer (GGT) encoder. The pretrained structural representations are cross fused via a multi-scale U-Net architecture to capture both local residue-level interactions and global conformational patterns associated with peptide toxicity. GraphTox explicitly models spatial relationships between residues, thereby efficiently capturing structural aspects that are generally neglected by sequence-based predictors, such as residue clustering, hydrophobic interactions and electrostatic organization. On benchmark datasets our framework shows superior performance and interpretability over the existing state-of-the-art methods. Our hybrid hierarchical structural modelling framework is a superior computational platform to improve the prediction of peptide toxicity and expedite the creation of safer peptide therapies. https://github.com/debraj-55555/GraphTox

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
18.7%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.4%
14.4%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.5%
8.5%
4
Nature Machine Intelligence
61 papers in training set
Top 0.3%
6.9%
5
Nature Communications
4913 papers in training set
Top 33%
4.9%
50% of probability mass above
6
Advanced Science
249 papers in training set
Top 5%
3.6%
7
Chemical Science
71 papers in training set
Top 0.4%
3.3%
8
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
9
Journal of Cheminformatics
25 papers in training set
Top 0.2%
2.1%
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.9%
11
Nature Biotechnology
147 papers in training set
Top 5%
1.7%
12
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
13
Cell Systems
167 papers in training set
Top 7%
1.7%
14
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
15
Communications Chemistry
39 papers in training set
Top 0.3%
1.7%
16
Scientific Reports
3102 papers in training set
Top 61%
1.5%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
19
Patterns
70 papers in training set
Top 1%
1.2%
20
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.9%
21
Nature Methods
336 papers in training set
Top 6%
0.9%
22
iScience
1063 papers in training set
Top 29%
0.8%
23
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.8%
24
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.8%
0.8%
25
Communications Biology
886 papers in training set
Top 26%
0.7%
26
mAbs
28 papers in training set
Top 0.4%
0.6%
27
Journal of Molecular Biology
217 papers in training set
Top 4%
0.6%
28
Molecules
37 papers in training set
Top 3%
0.5%
29
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%