mAbs
○ Informa UK Limited
All preprints, ranked by how well they match mAbs's content profile, based on 28 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Sinha, A.; Park, J. M.; Gulzar, N.; Pandya, D. N.; Wadas, T. J.; Scott, J. K.
Show abstract
We report a functional pipeline for facile conversion of variable Fv domains, typically discovered in antibody discovery programs, into chimeric monoclonal antibodies (mAbs). Often, in initial screenings, a set of candidate mAbs is produced in small volumes and purified from supernatant for testing. Our pipeline also simplifies purification of mAbs by using an extended histidine tag (His-10) fused to the C-terminus of the light chain. Both the length of the His-10 and its location have been shown to affect the efficacy of mAb purification using an inexpensive nickel-based resin at neutral pH. Our antibody cloning and purification pipeline, when followed together with detection and affinity measurements, can be smoothly incorporated into an antibody discovery workflow.
Capel, H. L.; Ellmen, I.; Murray, C. J.; Mignone, G.; Black, M.; Clarke, B.; Breen, C.; Tierney, S.; Dougan, P.; Buick, R. J.; Greenshields-Watson, A.; Deane, C. M.
Show abstract
In developing therapeutic antibodies, the heavy chain is often prioritised due to its higher variability and its central role in antigen binding. An appropriate pairing of the light sequence is however important for antibody function. Here we present LICHEN, a heavy chain conditioned light sequence generation tool that enables collaborative light sequence design by leveraging computational capabilities alongside experimental expertise. LICHEN generates light sequences which are valid (antibodylike), diverse in sequence and structure, and conditioned on a specific heavy chain. LICHEN can also condition on germline and CDRs and automatically filter generated sequences for required properties. This allows LICHEN to be used across multiple antibody development use cases. We carry out experimental validation of the method conditioning only on the heavy sequence and on the heavy sequence and binding information. Our in vitro results show that sequences created by LICHEN have effective expression yields and can retain antigen-binding.
Ramanujan, S.; Mazrooei, P.; O'Neil, D.; Chen, B.; Izadi, S.
Show abstract
Monoclonal antibodies (mAbs) with long systemic persistence are widely used as therapeutics. However, antibodies with atypically fast clearance require more dosing, limiting their clinical usefulness. Deep learning can facilitate using sequence-based modeling to predict potential pharmacokinetic (PK) liabilities before antibody generation. Assembling a dataset of 103 mAbs with measured nonspecific clearance in cynomolgus monkeys (cyno), and using transfer learning from large protein language models, we developed multiple machine learning models to predict mAb clearance as fast/slow clearing. Focusing on minimizing misclassification of potentially promising molecules as fast clearing, our results show that using physicochemical properties yielded up to 73.1+/-1.1% classification accuracy on hold-out test data (precision 65.2+/-2.3%). Using only sequence-based features from deep learning protein language models yielded a comparable performance of 71+/-1.4% (precision 65.5+/-2.5%). Combining structural and deep learning derived features yielded a similar accuracy of 73.9+/-1.1%, and slightly improved precision (68.3+/-2.4%). Features important for classifying fast/slow clearance point to charge, moment, and surface area properties at pH 7.4 as well as deep learning derived features. These results suggest that the protein language models provide comparable information and predictive performance of clearance as physicochemical features. This work provides a foundation for in silico prediction of protein pharmacokinetics to inform antibody candidate generation and early deprioritization of designs with high risk of fast clearance. More generally, it illustrates the value of transfer learning-based application of protein language models to address characteristics of importance for protein therapeutics.
Li, B.; Luo, S.; Wang, W.; Xu, J.; Liu, D.; Shameem, M.; Mattila, J.; Franklin, M.; Hawkins, P. G.; Atwal, G. S.
Show abstract
Selection of lead therapeutic molecules is often driven predominantly by pharmacological efficacy and safety. Candidate developability, such as biophysical properties that affect the formulation of the molecule into a product, is usually evaluated only toward the end of the drug development pipeline. The ability to evaluate developability properties early in the process of antibody therapeutic development could accelerate the timeline from discovery to clinic and save considerable resources. In silico predictive approaches, such as machine learning models, which map molecules to predictions of developability properties could offer a cost-effective and high-throughput alternative to experiments for antibody developability assessment. We developed a computational framework, PROPERMAB, for large-scale and efficient in silico prediction of developability properties for monoclonal antibodies, using custom molecular features and machine learning modeling. We demonstrate the power of PROPERMAB by using it to develop models to predict antibody hydrophobic interaction chromatography retention time and high-concentration viscosity. We further show that structure-derived features can be rapidly and accurately predicted directly from sequences by pre-training simple models for molecular features, thus providing the ability to scale these approaches to repertoire-scale sequence datasets.
Evers, A.; Malhotra, S.; Bolick, W.-G.; Najafian, A.; Borisovska, M.; Warszawski, S.; Fomekong Nanfack, Y.; Kuhn, D.; Rippmann, F.; Crespo, A.; Sood, V.
Show abstract
To select the most promising screening hits from antibody and VHH display campaigns for subsequent in-depth profiling and optimization, it is highly desirable to assess and select sequences on properties beyond only their binding signals from the sorting process. In addition, developability risk criteria, sequence diversity and the anticipated complexity for sequence optimization are relevant attributes for hit selection and optimization. Here, we describe an approach for the in silico developability assessment of antibody and VHH sequences. This method not only allows for ranking and filtering multiple sequences with regard to their predicted developability properties and diversity, but also visualizes relevant sequence and structural features of potentially problematic regions and thereby provides rationales and starting points for multi-parameter sequence optimization.
Chinery, L.; Hummer, A. M.; Mehta, B. B.; Akbar, R.; Rawat, P.; Slabodkin, A.; Le Quy, K.; Lund-Johansen, F.; Greiff, V.; Jeliazkov, J. R.; Deane, C. M.
Show abstract
Strong antibody-antigen binding is the primary consideration when developing an efficacious therapeutic antibody. In recent years, much work has been devoted to applying complex machine learning models to this cause, yet simple baselines are often lacking. Here, we show that the widely used sequence alignment method, BLOSUM, can yield diverse, binder-enriched libraries from a single starting antibody. Using Trastuzumab-HER2 as a model system, we experimentally validated 720 novel designs generated with five different computational methods using surface plasmon resonance. The BLOSUM substitution matrix outperformed all four deep learning design approaches tested, achieving an estimated minimum binder enrichment of 12.5% and producing nine sub-nanomolar binders. These results underscore the importance of comparing against simple baselines and set a benchmark to guide future computational antibody library design. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=141 SRC="FIGDIR/small/586756v2_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@1597ee9org.highwire.dtl.DTLVardef@9af4b6org.highwire.dtl.DTLVardef@1380e61org.highwire.dtl.DTLVardef@1380d29_HPS_FORMAT_FIGEXP M_FIG C_FIG
Kothiwal, D.; Kollasch, A. W.; Hollmer, N.; Ghosh, A.; Zhang, R.; Anuganti, M.; Paul, S. B.; Zagar, Y.; Abdollahi, M.; Anderson, Z.; Belay, F.; Salotto, M.; Ulmer, S.; AbdelAlim, Y. A.; Kumar, S.; Vangala, M.; Yang, C.; Chedotal, A.; Jardine, J. G.; Teixeira, A. A. R.; Moshinsky, D. J.; Zhu, H.; Zhu, S.; Springer, T. A.; Marks, D. S.; Meijers, R.
Show abstract
Machine learning (ML) has the potential to revolutionize antibody design and selection, but its success depends on access to extensive, well-curated datasets of antibody-antigen interactions. To address this need, we developed a synthetic Fab yeast display library optimized for seamless ML integration, focusing on sequence diversity within the CDRH3 loop. The library incorporates key sequence features derived from human B cell repertoires essential for efficient antibody generation captured in a compact antigen recognition module (ARM) format. Built using the VH1-69 heavy chain and four light chains, the library was evaluated against ten human and murine cell surface antigens, including PD-L1, TIGIT, and ROBO1. This approach yielded hundreds of antibodies with robust biophysical properties, validated for functional performance in flow cytometry and immunohistochemistry. Furthermore, ML analysis identified additional antibodies for ROBO2 and PD-L2 from the aggregate sequencing data, demonstrating utility for hybrid in silico and experimental workflows. We provide a publicly accessible dataset comprising more than 68,000 Fab sequences and 486 characterized antibodies. This study establishes an ML-compatible framework designed to accelerate and streamline antibody discovery and development.
Aguilar Rangel, M.; Bedwell, A.; Costanzi, E.; Ricagno, S.; Frydman, J.; Vendruscolo, M.; Sormanni, P.
Show abstract
De novo design methods hold the promise of reducing the time and cost of antibody discovery, while enabling the facile and precise targeting of predetermined epitopes. Here we describe a fragment-based method for the combinatorial design of antibody binding loops and their grafting onto antibody scaffolds. We designed and tested six single-domain antibodies targeting different epitopes on three antigens, including the receptor-binding domain of the SARS-CoV-2 spike protein. Biophysical characterisation showed that all designs are highly stable, and bind their intended targets with affinities in the nanomolar range without any in vitro affinity maturation. We further discuss how a high-resolution input antigen structure is not required, as our method yields similar predictions when the input is a crystal structure or a computer-generated model. This computational procedure, which readily runs on a laptop, provides a starting point for the rapid generation of lead antibodies binding to pre-selected epitopes. summaryA combinatorial method can rapidly design nanobodies for predetermined epitopes, which bind with KDs in the nanomolar range.
Manso, T.; Sanou, G.; Nousias, C.; Maalem, I.; Boutin, F.; Giudicelli, V.; Duroux, P.; Lefranc, M.-P.; Kossida, S.
Show abstract
Monoclonal antibodies (mAbs) and fusion proteins for immune applications (FPIA) play a crucial role in treating autoimmune diseases and cancers by targeting cell-surface proteins and triggering multiple immune mechanisms. These functions are mediated by the fragment crystallizable (Fc) region of mAbs and fusion proteins, whose interaction with Fc gamma receptors (Fc{gamma}Rs) can be modulated through Fc amino acid (AA) engineering. To address this, we developed the IMGT/FcVariantsExplorer tool (https://www.imgt.org/fcvariantsexplorer/) to identify AA changes within the Fc region in mAb and fusion proteins sequences from IMGT/2Dstructure-DB, the AA sequence database of IMGT(R), the international ImMunoGeneTics information system(R). We used the IMGT(R) nomenclature of engineered Fc variants involved in antibody effector properties and formats, applying a standardized classification in five categories: Effector, Half-life, Physicochemical properties, Structure, and Hybrid. We analyzed sequences of 1,107 mAbs and fusion proteins, identifying 483 entries with Fc AA changes, resulting in 211 unique Fc variants in the dataset. We also used web scraping to retrieve associated biological data from literature. All data have been integrated into IMGT/mAb-DB, with links to sequences in IMGT/2Dstructure-DB, enabling users to query Fc variants by their Category or Effect. This curated dataset reveals key trends in antibody engineering.
Santolla, N.; Pridgen, T.; Nigam, P.; Ford, C. T.
Show abstract
The discovery of therapeutic antibodies is a traditionally arduous process. Today, the lab-based process of antibody discovery consists of several time-consuming steps that involve live animal immunization, B-cell harvesting, hybridoma creation, and then downstream engineering and evaluation. However, the use of artificial intelligence in drug design has previously been shown effective in the rapid generation of proteinspecific binders, small molecules, and even antibody therapeutics, thereby replacing some of the primary steps of the drug discovery process. Here we present peleke-1, a suite of protein language models fine-tuned from state-of-the-art large language models using curated antibody-antigen complex data. These models generate targeted antibody Fv sequences for a given antigen sequence input at-scale. This suite of models provides a reliable, artificial intelligence-driven approach for in silico therapeutic antibody discovery along with an open-source framework for future antibody language model tuning.
Harvey, E. P.; Shin, J.-E.; Skiba, M. A.; Nemeth, G. R.; Hurley, J. D.; Wellner, A.; Shaw, A. Y.; Miranda, V. G.; Min, J. K.; Liu, C. C.; Marks, D. S.; Kruse, A.
Show abstract
Antibodies are essential biological research tools and important therapeutic agents, but some exhibit non-specific binding to off-target proteins and other biomolecules. Such polyreactive antibodies compromise screening pipelines, lead to incorrect and irreproducible experimental results, and are generally intractable for clinical development. We designed a set of experiments using a diverse naive synthetic camelid antibody fragment ( nanobody) library to enable machine learning models to accurately assess polyreactivity from protein sequence (AUC > 0.8). Moreover, our models provide quantitative scoring metrics that predict the effect of amino acid substitutions on polyreactivity. We experimentally tested our models performance on three independent nanobody scaffolds, where over 90% of predicted substitutions successfully reduced polyreactivity. Importantly, the model allowed us to diminish the polyreactivity of an angiotensin II type I receptor antagonist nanobody, without compromising its pharmacological properties. We provide a companion web-server that offers a straightforward means of predicting polyreactivity and polyreactivity-reducing mutations for any given nanobody sequence.
Hepler, N. L.; Hill, A. J.; Jaffe, D. B.; Gibbons, M. C.; Pfeiffer, K. A.; Hilton, D. M.; Freeman, M.; McDonnell, W. J.
Show abstract
GLIMPSE-1 is a protein language model trained solely on paired human antibody sequences. It captures immunological features and achieves best-in-class performance in humanization benchmarks. We demonstrate the utility of GLIMPSE-1 in humanization; engineering of antibodies for affinity, species cross-reactivity, and key developability parameters; and the creation of highly divergent functional variants with <90% sequence identity to a marketed antibody. Learning exclusively from human antibody data enables GLIMPSE-1 to enhance therapeutics and native antibodies based on patterns in the human repertoire. DisclaimerWhile we provide detailed descriptions of experimental methods and success metrics, certain methodological details of GLIMPSE-1 remain proprietary and/or redacted in this work for commercial considerations. We warmly invite researchers and potential collaborators interested in accessing GLIMPSE-1 to connect with our team via partnerships@infinimmune.com.
Raj Unnikandam Veettil, S.; Donatelli, J.; Kalra, G.; Veronica Ljubetic San Martin, C.; Ramakrishnan, S.; McGregor, C.; Wallace, M.; Ankala, R.; Rodrigues de Souza Pinto, L.; Dhama, A.; Regens, C.; Li, Y.; Smith, D.
Show abstract
The generation of clonal CHO cell lines is foundational to biologics manufacturing; however, labor-intensive cell culture workflows predominate in the field. We created the CLAIRE (Cell Line AI Recognition and Evaluation) tool to streamline end-to-end cell line development by integrating deep-learning image analysis with automated liquid handling. We benchmarked three object detection models for monoclonality verification and found DETR provides superior accuracy (>0.90 F1-score) in identifying single cells. To quantify the outgrowth of cell lines, we evaluated multiple zero-shot SAM2 segmentation models against a feature-based estimation method. Feature-based detection successfully identified diverse cell colony types while less robust performance was observed for SAM2 models, particularly for sparse density colonies. The pre-trained DETR and feature-based detection models were wrapped in a task-focused user interface that outputs cell line hitpick lists compatible with a Lynx LM1800 liquid handler in addition to custom scripts automating cell passaging and sampling. This approach yielded an end-to-end 36 day CLD workflow capable of generating high-titer cell lines for multiple complex antibody structures. Here, we open-access our trained models, user interface, and Lynx automation scripts to provide a modular toolkit useful for clonal cell line engineering projects. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=153 SRC="FIGDIR/small/703387v1_ufig1.gif" ALT="Figure 1"> View larger version (51K): org.highwire.dtl.DTLVardef@1f72e70org.highwire.dtl.DTLVardef@109c54dorg.highwire.dtl.DTLVardef@7867b1org.highwire.dtl.DTLVardef@dfa61e_HPS_FORMAT_FIGEXP M_FIG C_FIG
Zhou, Q.; Chomicz, D.; Melvin, D.; Griffiths, M.; Yahiya, S.; Reece, S.; Le Pannerer, M.-M.; Krawczyk, K.
Show abstract
Preclinical antibody discovery relies on progressive screening and down-selection of candidate antibodies from large immune repertoires, yet this critical process is poorly represented in existing public databases. Here we introduce KyDab (Kymouse Antibody Database), a well-curated database of antibody discovery selection data generated using standardized workflows on the Kymouse humanized mouse platform. The current release includes 11 Kymouse platform mice immunisation studies covering 51 immunogens, more than 120,000 paired heavy-light chain sequences, and binding measurements for a selected subset of experimentally characterized clones. By capturing full-funnel selection data with consistent metadata and both positive and negative experimental outcomes, KyDab provides a valuable data resource for the development and evaluation of artificial intelligence models for antibody discovery. KyDab is accessible https://kydab.naturalantibody.com, and the database will be continuously updated as new datasets become available.
Park, E.; Izadi, S.
Show abstract
Understanding the molecular surface properties of monoclonal antibodies (mAbs) is crucial for determining their function, affinity, and developability. Yet, robust methods to accurately represent the key structural and biophysical features of mAbs on their molecular surface are still limited. Here, we introduce MolDesk, a set of molecular surface descriptors specifically designed for predicting antibody developability characteristics. We assess the performance of these descriptors by directly benchmarking their correlations with an extensive array of in vitro and in vivo data, including viscosity at high concentration, aggregation, hydrophobic interaction chromatography (HIC), human pharmacokinetic (PK) clearance, Heparin retention time, and polyspecificity. Additionally, we investigate the sensitivity of these surface descriptors to methodological nuances, such as the choice of interior dielectric constant for electrostatic potential calculations, residue-level hydrophobicity scales, initial antibody structure models, and the impact of conformational sampling. Based on our benchmarking analysis, we propose six in silico developability rules that leverage these molecular surface descriptors and demonstrate their superior ability to predict the clinical progression of therapeutic antibodies compared to established models like TAP. 1
Chen, X.; Dougherty, T.; Hong, C.; Schibler, R.; Zhao, Y. C.; Sadeghi, R.; Matasci, N.; Wu, Y.-C.; Kerman, I.
Show abstract
Antibodies are prominent therapeutic agents but costly to develop. Existing approaches to predict developability depend on structure, which requires expensive laboratory or computational work to obtain. To address this issue, we present a machine learning pipeline to predict developability from sequence alone using physicochemical and learned embedding features. Our approach achieves high sensitivity and specificity on a dataset of 2400 antibodies. These results suggest that sequence is predictive of developability, enabling more efficient development of antibodies.
Gordon, G. L.; Gervasio, J.; Souders, C.; Deane, C. M.
Show abstract
Developability optimisation is an important step for successful biotherapeutic design. For monoclonal antibodies, developability is relatively well characterised. However, progress for novel biotherapeutics such as nanobodies is more limited. Differences in structural features between antibodies and nanobodies render current antibody computational methods unsuitable for direct application to nanobodies. Following the principles of the Therapeutic Antibody Profiler (TAP), we have built the Therapeutic Nanobody Profiler (TNP), an open-source computational tool for predicting nanobody developability. Tailored specifically for nanobodies, it accounts for their unique properties compared to conventional antibodies for more efficient development of this novel therapeutic format. We calibrate TNP metrics using the 36 currently available clinical-stage nanobody sequences. We also collected experimental developability data for 108 nanobodies and examine how these results are related to the TNP guidelines. TNP is available as a web application at opig.stats.ox.ac.uk/webapps/tnp.
Talaei, M.; Walker, K. C.; Hao, B.; Jolley, E.; Jin, Y.; Kozakov, D.; Misasi, J.; Vajda, S.; Paschalidis, I. C.; Joseph-McCarthy, D.
Show abstract
Antibodies are a leading class of biologics, yet their architecture with conserved framework regions and hypervariable complementarity-determining regions (CDRs) poses unique challenges for computational modeling. We present a region-aware pretraining strategy for paired heavy (VH) and light (VL) sequences in variable domains using ESM2-3B and ESM C (600M) protein language models. We compare three masking strategies: whole-chain, CDR-focused, and a hybrid approach. Through evaluation on binding affinity datasets spanning single-mutant panels and combinatorial mutants, we demonstrate that CDR-focused training produces superior embeddings for functional prediction. Notably, training only on VH-VL pairs proves sufficient, eliminating the need for massive unpaired pretraining that provides no measurable downstream benefit. Our compact 600M ESM C model achieves state-of-the-art performance, matching or exceeding larger antibody-specific baselines. These findings establish a principled framework for antibody language models: prioritize paired sequences with CDR-aware supervision over scale and complex training curricula to achieve both computational efficiency and predictive accuracy.
Beasley, M. D.; Aracic, S.; Gracey, F. M.; Kannan, R.; Masarati, A.; Premaratne, S. R.; Udawela, M.; Wood, R. E.; Jabar, S.; Church, N.; Le, T.-K.; Makris, D.; McColl, B. K.; Kiefel, B. R.
Show abstract
Antibodies with high affinity against the receptor binding domain (RBD) of the SARS-CoV-2 S1 ectodomain were identified from screens using the Retained Display (ReD) platform employing a 1 x 1011 clone single-chain antibody (scFv) library. Numerous unique scFv clones capable of inhibiting binding of the viral S1 ectodomain to the ACE2 receptor in vitro were characterized. To maximize avidity, selected clones were reformatted as bivalent diabodies and monoclonal antibodies (mAb). The highest affinity mAb completely neutralized live SARS-CoV-2 virus in cell culture for four days at a concentration of 6.7 nM, suggesting potential therapeutic and/or prophylactic use. Furthermore, scFvs were identified that greatly increased the interaction of the viral S1 trimer with the ACE2 receptor, with potential implications for vaccine development.
Cook, R. L.; Martelly, W.; Agu, C. V.; Gushgari, L. R.; Moreno, S.; Kesiraju, S.; Mohan, M.; Takulapalli, B.
Show abstract
Drug discovery continues to face a staggering 90% failure rate, with many setbacks occurring during late-stage clinical trials. To address this challenge, there is an increasing focus on developing and evaluating new technologies to enhance the "design" and "test" phases of antibody-based drugs (e.g., monoclonal antibodies, bispecifics, CAR-T therapies, ADCs) and biologics during early preclinical development, with the goal of identifying lead molecules with a higher likelihood of clinical success. Artificial intelligence (AI) is becoming an indispensable tool in this domain, both for improving molecules identified through traditional approaches and for the de novo design of novel therapeutics. However, critical bottlenecks persist in the "build" and "test" phases of AI-designed antibodies and protein binders, impeding early preclinical evaluation. While AI models can rapidly generate thousands to millions of putative drug designs, technological and cost limitations mean that only a few dozen candidates are typically produced and tested. Drug developers often face a tradeoff between ultra-high-throughput wet lab methods that provide binary yes/no binding data and biophysical methods that offer detailed characterization of a limited number of drug-target pairs. To address these bottlenecks, we previously reported the development of the Sensor-integrated Proteome On Chip (SPOC(R)) platform, which enables the production and capture-purification of 1,000 - 2,400 folded proteins directly onto a surface plasmon resonance (SPR) biosensor chip for measuring kinetic binding rates with picomolar affinity resolution. In this study, we extend the SPOC technology to the expression of single-chain antibodies (sc-antibodies), specifically scFv and VHH, and dual-chain Fab constructs. We demonstrate that these proteins are capture-purified at high levels on SPR biosensors and retain functionality as shown by the binding specificity to their respective target antigens, with affinities comparable to those reported in the literature. SPOC outputs comprehensive kinetic data including quantitative binding (Rmax), on-rate (ka), off-rate (kd), affinity (KD), and half-life (t1/2), for each of thousands of on-chip sc-antibodies. Additionally, we present a case study showcasing single amino acid mutational scan of the complementarity-determining regions (CDRs) of a HER2 VHH (nanobody) paratope. Using 92 unique mutated variants from four different amino acid substitutions, we pinpoint critical residues within the paratope that could further enhance binding affinity. This study serves as a demonstration of a novel high-throughput approach for biophysical screening of hundreds to thousands of single chain antibody sequences in a single assay, generating high affinity resolution kinetic data to support antibody discovery and AI-enabled pipelines.