Back

High-Throughput Machine Learning-Aided Antibody Discovery for Cell Surface Antigens

Kothiwal, D.; Kollasch, A. W.; Hollmer, N.; Ghosh, A.; Zhang, R.; Anuganti, M.; Paul, S. B.; Zagar, Y.; Abdollahi, M.; Anderson, Z.; Belay, F.; Salotto, M.; Ulmer, S.; AbdelAlim, Y. A.; Kumar, S.; Vangala, M.; Yang, C.; Chedotal, A.; Jardine, J. G.; Teixeira, A. A. R.; Moshinsky, D. J.; Zhu, H.; Zhu, S.; Springer, T. A.; Marks, D. S.; Meijers, R.

2025-05-15 biophysics
10.1101/2025.05.15.650607 bioRxiv
Show abstract

Machine learning (ML) has the potential to revolutionize antibody design and selection, but its success depends on access to extensive, well-curated datasets of antibody-antigen interactions. To address this need, we developed a synthetic Fab yeast display library optimized for seamless ML integration, focusing on sequence diversity within the CDRH3 loop. The library incorporates key sequence features derived from human B cell repertoires essential for efficient antibody generation captured in a compact antigen recognition module (ARM) format. Built using the VH1-69 heavy chain and four light chains, the library was evaluated against ten human and murine cell surface antigens, including PD-L1, TIGIT, and ROBO1. This approach yielded hundreds of antibodies with robust biophysical properties, validated for functional performance in flow cytometry and immunohistochemistry. Furthermore, ML analysis identified additional antibodies for ROBO2 and PD-L2 from the aggregate sequencing data, demonstrating utility for hybrid in silico and experimental workflows. We provide a publicly accessible dataset comprising more than 68,000 Fab sequences and 486 characterized antibodies. This study establishes an ML-compatible framework designed to accelerate and streamline antibody discovery and development.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
mAbs
28 papers in training set
Top 0.1%
33.0%
2
Nature Communications
4913 papers in training set
Top 22%
8.4%
3
Nucleic Acids Research
1128 papers in training set
Top 5%
3.7%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.7%
5
Journal of the American Chemical Society
199 papers in training set
Top 2%
3.6%
50% of probability mass above
6
Antibody Therapeutics
16 papers in training set
Top 0.1%
3.1%
7
Nature Methods
336 papers in training set
Top 3%
3.1%
8
Advanced Science
249 papers in training set
Top 7%
2.6%
9
ACS Central Science
66 papers in training set
Top 0.9%
1.9%
10
Communications Biology
886 papers in training set
Top 8%
1.7%
11
Cell Chemical Biology
81 papers in training set
Top 2%
1.7%
12
Cell Discovery
54 papers in training set
Top 3%
1.7%
13
eLife
5422 papers in training set
Top 45%
1.5%
14
iScience
1063 papers in training set
Top 18%
1.5%
15
Cell Reports Methods
141 papers in training set
Top 3%
1.5%
16
Scientific Reports
3102 papers in training set
Top 64%
1.3%
17
Structure
175 papers in training set
Top 2%
1.2%
18
Angewandte Chemie International Edition
81 papers in training set
Top 3%
1.1%
19
PLOS ONE
4510 papers in training set
Top 62%
1.1%
20
Nature Biotechnology
147 papers in training set
Top 6%
0.9%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
22
Science Advances
1098 papers in training set
Top 26%
0.9%
23
RSC Chemical Biology
32 papers in training set
Top 0.6%
0.7%
24
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
25
Protein Science
221 papers in training set
Top 2%
0.7%
26
Science
429 papers in training set
Top 21%
0.6%