Back

Contrastive learning for antibody-antigen sequence-to-specificity prediction

Lee, H.; Castro, K.; Renwick, S.; Stalder, L.; Glanzer, W.; Kumar, R.; Chen, N.; Scheck, A.; Yermanos, A.; Mason, D.; Reddy, S. T.

2026-02-26 immunology
10.64898/2026.02.25.707916 bioRxiv
Show abstract

Predicting which antibodies bind to which antigens directly from primary amino acid sequences remains a major challenge, as no current method can reliably determine this specificity at both a repertoire and proteome scale. Structure-based protein design frameworks can propose antibody binders to specified antigenic epitopes, but they do not solve the "sequence-to-specificity" task of mapping antibodies to cognate epitopes, and vice versa. Here, we introduce CALM (Cross-attention Adaptive Immune Receptor-Antigen Language Model), a dual-encoder plus cross-attentive decoder architecture that treats antibody-antigen recognition as molecular translation. Using contrastive learning, antigen and antibody encoders learn a shared embedding space that aligns cognate epitope-paratope binding pairs. CALM-1.0 is trained and evaluated on 4,138 curated antibody-antigen pairs obtained from the PDB-derived structural antibody database (SAbDab). On a leakage-controlled test split drawn from sequence clusters at 80% identity and unseen during training, CALM-1.0 achieves a mean top-1 retrieval (R@1) of 7%, with consistent performance across both directions (Ab[->]Ag and Ag[->]Ab). CALM establishes a foundation for bidirectional antibody-antigen sequence-to-specificity prediction with the potential to unify retrieval and generative design.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.5%
17.4%
2
Science
429 papers in training set
Top 2%
12.3%
3
Nature Communications
4913 papers in training set
Top 22%
8.4%
4
Nature Computational Science
50 papers in training set
Top 0.1%
8.4%
5
Nature Methods
336 papers in training set
Top 2%
6.3%
50% of probability mass above
6
Nature
575 papers in training set
Top 5%
4.8%
7
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 20%
3.6%
8
Bioinformatics
1061 papers in training set
Top 6%
2.4%
9
Nature Biotechnology
147 papers in training set
Top 4%
1.9%
10
eLife
5422 papers in training set
Top 38%
1.9%
11
Frontiers in Immunology
586 papers in training set
Top 4%
1.8%
12
Patterns
70 papers in training set
Top 0.9%
1.7%
13
Nature Medicine
117 papers in training set
Top 2%
1.7%
14
Advanced Science
249 papers in training set
Top 11%
1.7%
15
Communications Biology
886 papers in training set
Top 10%
1.7%
16
Nature Machine Intelligence
61 papers in training set
Top 2%
1.5%
17
Structure
175 papers in training set
Top 2%
1.3%
18
Scientific Reports
3102 papers in training set
Top 67%
1.2%
19
mAbs
28 papers in training set
Top 0.2%
1.2%
20
Genome Medicine
154 papers in training set
Top 6%
1.1%
21
PLOS Computational Biology
1633 papers in training set
Top 20%
1.1%
22
Nature Immunology
71 papers in training set
Top 2%
0.9%
23
Cell Reports
1338 papers in training set
Top 32%
0.8%
24
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%
25
PLOS ONE
4510 papers in training set
Top 68%
0.7%
26
iScience
1063 papers in training set
Top 35%
0.7%
27
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%