Back

Benchmarking Boltz-2 for Screening of Therapeutic Antibody-Antigen Interactions

Fieux-Castagnet, A.; Waton, J.; Glukhonemykh, A.; Snow, E.; Ashokkumar, R.; Fleming, J.; Champagne, D.; Devenyns, T.; Peluffo, A.; Anagnostopoulos, C.

2026-05-14 bioinformatics
10.64898/2026.05.13.724924 bioRxiv
Show abstract

Protein structure prediction models (such as AlphaFold, Chai, Boltz) have transformed structural biology and are increasingly explored for drug discovery; however, their utility for large-scale screening of antibody-antigen (AB-AG) interactions remains unclear, particularly for distinguishing true binding from non-binding pairs at scale. To our knowledge, there has not been an exhaustive exploration of Boltz-2 inference settings on this high impact problem, and in this paper we set out to describe and implement a novel benchmarking framework that can accelerate progress in the field. We evaluated Boltz-2 (NVIDIA NIM implementation) on 519 therapeutic monoclonal antibodies from Thera-SAbDab, pairing each antibody with its cognate target and a randomly assigned non-cognate antigen. We developed a novel evaluation framework that systematically captures variability across stochastic seeds while benchmarking different inference settings, including datasets with and without crystallographically resolved antibody structures. Across settings, Boltz-2-derived confidence metrics showed weak, though above-chance, discrimination (0.5 < ROC-AUC < 0.60). Among evaluated metrics, the minimum value of the interface predicted TM-score (ipTM-min) across seed-samples, captured the strongest signal. Interestingly, additional feature aggregation and multivariate modelling provided little to no improvement. Increasing the number of stochastic predictions yielded front-loaded gains, with diminishing returns beyond [~]15-20 seed-samples, suggesting limited value of extensive sampling in practical workflows. Notably, inference without multiple sequence alignments (MSAs) slightly improved performance on non-crystallized antibodies ({Delta}AUROC {approx} +0.027) while reducing runtime by [~]8 seconds per prediction compared to shallow MSA settings. Overall, these results indicate that off-the-shelf confidence metrics from general-purpose structure prediction models may be insufficient for reliable target-antibody screening and highlight the need for task-specific optimization, while confirming that modest amounts of sampling can be helpful, but not in itself sufficient to improve performance significantly as gains plateau relatively quickly.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Journal of Cheminformatics
25 papers in training set
Top 0.1%
14.3%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.7%
8.4%
3
Bioinformatics
1061 papers in training set
Top 4%
6.4%
4
Protein Science
221 papers in training set
Top 0.2%
6.4%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.8%
6
Nature Communications
4913 papers in training set
Top 33%
4.8%
7
PLOS Computational Biology
1633 papers in training set
Top 8%
4.3%
8
Structure
175 papers in training set
Top 0.7%
4.0%
50% of probability mass above
9
Communications Biology
886 papers in training set
Top 2%
3.7%
10
Nature Methods
336 papers in training set
Top 3%
3.6%
11
Cell Systems
167 papers in training set
Top 5%
3.1%
12
mAbs
28 papers in training set
Top 0.1%
2.1%
13
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.4%
1.9%
14
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.8%
16
Scientific Reports
3102 papers in training set
Top 55%
1.8%
17
Journal of Structural Biology
58 papers in training set
Top 1.0%
1.3%
18
Frontiers in Immunology
586 papers in training set
Top 5%
1.3%
19
Communications Chemistry
39 papers in training set
Top 0.5%
1.2%
20
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
21
Advanced Science
249 papers in training set
Top 16%
0.9%
22
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
23
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.3%
0.9%
24
eLife
5422 papers in training set
Top 54%
0.9%
25
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
26
PeerJ
261 papers in training set
Top 13%
0.8%
27
Chemical Science
71 papers in training set
Top 2%
0.7%
28
PLOS ONE
4510 papers in training set
Top 67%
0.7%
29
Patterns
70 papers in training set
Top 2%
0.7%
30
Nature Biotechnology
147 papers in training set
Top 8%
0.7%