Back

Explainable protein-protein binding affinity prediction via fine-tuning protein language models

Singh, H.; SINGH, R. K.; Srivastava, S. P.; Pradhan, S.; Gorantla, R.

2026-04-01 bioinformatics
10.64898/2026.03.30.715237 bioRxiv
Show abstract

Predicting protein-protein binding affinity from sequence alone remains a bottleneck for anti-body optimization, biologics design and large-scale affinity modelling. Structure-based methods achieve high accuracy but cannot scale when complex structures are unavailable. Here we present a framework that reframes affinity prediction as metric learning: two proteins are projected into a shared latent space in which cosine similarity directly correlates with experimental binding affinity, and the protein language model encoder is adapted through parameter-efficient finetuning (PEFT). On the PPB-Affinity benchmark, the model achieves Pearson r = 0.89 on a random split, generalises to evolutionarily distant proteins (r = 0.61 at < 30% sequence identity) and surpasses structure-based deep learning baselines across biological subgroups, without any three-dimensional input. On the strictly de-overlapped AB-Bind dataset, few-shot adaptation with 30% of assay data (Pearson r = 0.756, RMSE = 0.688) out-performs methods trained on 90% of data; consistent gains are observed across nine diverse AbBiBench deep-mutational-scanning assays with 10-30% labelled variants. Residue-level explainability reveals that the model concentrates importance on interface-localised residues aligned with experimentally validated interaction hotspots across enzyme-inhibitor, and antibody-antigen systems. Together, these results establish a scalable, explainable and data-efficient route to protein-protein binding affinity prediction and therapeutic antibody optimisation from sequence alone.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.5%
2
Cell Systems
167 papers in training set
Top 0.9%
10.4%
3
Nature Methods
336 papers in training set
Top 1%
7.2%
4
Nature Biotechnology
147 papers in training set
Top 1%
6.4%
5
Science
429 papers in training set
Top 5%
6.4%
50% of probability mass above
6
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
7
Nature
575 papers in training set
Top 7%
3.6%
8
Advanced Science
249 papers in training set
Top 6%
3.1%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 23%
3.1%
10
Nucleic Acids Research
1128 papers in training set
Top 7%
2.9%
11
Communications Biology
886 papers in training set
Top 4%
2.4%
12
Bioinformatics
1061 papers in training set
Top 6%
2.1%
13
PLOS Computational Biology
1633 papers in training set
Top 15%
1.9%
14
mAbs
28 papers in training set
Top 0.2%
1.7%
15
eLife
5422 papers in training set
Top 47%
1.3%
16
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
17
Nature Chemical Biology
104 papers in training set
Top 2%
1.2%
18
Cell Genomics
162 papers in training set
Top 5%
1.1%
19
Nature Genetics
240 papers in training set
Top 7%
0.8%
20
Genome Medicine
154 papers in training set
Top 7%
0.8%
21
Patterns
70 papers in training set
Top 2%
0.7%
22
Science Advances
1098 papers in training set
Top 30%
0.7%
23
Scientific Reports
3102 papers in training set
Top 75%
0.7%
24
Genome Biology
555 papers in training set
Top 8%
0.6%
25
Frontiers in Immunology
586 papers in training set
Top 9%
0.6%
26
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.6%