mAbs — Latest Matching Preprints

1

Better antibodies engineered with a GLIMPSE of human data

Hepler, N. L.; Hill, A. J.; Jaffe, D. B.; Gibbons, M. C.; Pfeiffer, K. A.; Hilton, D. M.; Freeman, M.; McDonnell, W. J.

2025-06-11 bioinformatics 10.1101/2025.06.08.658113 medRxiv

Top 0.1%

88.6%

Show abstract

GLIMPSE-1 is a protein language model trained solely on paired human antibody sequences. It captures immunological features and achieves best-in-class performance in humanization benchmarks. We demonstrate the utility of GLIMPSE-1 in humanization; engineering of antibodies for affinity, species cross-reactivity, and key developability parameters; and the creation of highly divergent functional variants with <90% sequence identity to a marketed antibody. Learning exclusively from human antibody data enables GLIMPSE-1 to enhance therapeutics and native antibodies based on patterns in the human repertoire. DisclaimerWhile we provide detailed descriptions of experimental methods and success metrics, certain methodological details of GLIMPSE-1 remain proprietary and/or redacted in this work for commercial considerations. We warmly invite researchers and potential collaborators interested in accessing GLIMPSE-1 to connect with our team via partnerships@infinimmune.com.

2

The Therapeutic Nanobody Profiler: characterising and predicting nanobody developability to improve therapeutic design

Gordon, G. L.; Gervasio, J.; Souders, C.; Deane, C. M.

2025-08-14 bioinformatics 10.1101/2025.08.11.669635 medRxiv

Top 0.1%

80.1%

Show abstract

Developability optimisation is an important step for successful biotherapeutic design. For monoclonal antibodies, developability is relatively well characterised. However, progress for novel biotherapeutics such as nanobodies is more limited. Differences in structural features between antibodies and nanobodies render current antibody computational methods unsuitable for direct application to nanobodies. Following the principles of the Therapeutic Antibody Profiler (TAP), we have built the Therapeutic Nanobody Profiler (TNP), an open-source computational tool for predicting nanobody developability. Tailored specifically for nanobodies, it accounts for their unique properties compared to conventional antibodies for more efficient development of this novel therapeutic format. We calibrate TNP metrics using the 36 currently available clinical-stage nanobody sequences. We also collected experimental developability data for 108 nanobodies and examine how these results are related to the TNP guidelines. TNP is available as a web application at opig.stats.ox.ac.uk/webapps/tnp.

3

Agent-Guided De Novo Design of Nanobody Binders Against a Novel Cancer Target

Zhao, Y.; Yilmaz, M.; Lee, E.; Teh, C.; Guo, L.; Sonmez, K.; Giancardo, L.; Trang, G.; Xu, F.; Espinosa-Cotton, M.; Cheung, N.-K.; Kim, J.; Cheng, X.

2026-04-17 bioinformatics 10.64898/2026.04.13.717816 medRxiv

Top 0.1%

77.7%

Show abstract

Therapeutic antibody discovery remains slow and resource-intensive, with traditional methods providing limited control over epitope selection. We present a workflow for de novo nanobody design applied to a novel Desmoplastic Small Round Cell Tumor target encompassing four stages: (1) epitope identification guided by our hotspot recommendation agent using physical chemistry-based structure and sequence analysis tools with two curated databases (IEDB, PFAM), (2) de novo nanobody generation using three independent methods (RFantibody, IgGM, mBER) across multiple predicted antigen structures and nanobody frameworks, (3) multi-metric scoring including structural metrics from folding models, and in silico binding affinity from our sequence-based predictor, (4) high-throughput yeast surface display (YSD) screening followed by surface plasmon resonance (SPR) characterization of the specific binders. We generated 288,000 nanobody designs spanning eight target epitope regions and three variable domains of heavy chain-only antibody (VHH) frameworks. Multi-objective Pareto filtering with our candidate selection agent yielded 100,000 candidates for YSD screening with fluorescence-activated cell sorting (FACS). Of 116 enriched candidates advanced to SPR characterization, 46/116 (39.7%) produced reliable kinetic fits with Rmax [≥] 30 RU, yielding KD values from 0.66 nM to 305 nM (median 31.7 nM). These results show that an agent-guided computational workflow can design nanomolar to sub-nanomolar nanobody binders against a novel target without experimental structure or prior antibody information.

4

Designing Feature-Controlled Humanoid Antibody Discovery Libraries Using Generative Adversarial Networks

Amimeur, T.; Shaver, J. M.; Ketchem, R. R.; Taylor, J. A.; Clark, R. H.; Smith, J.; Van Citters, D.; Siska, C. C.; Smidt, P.; Sprague, M.; Kerwin, B. A.; Pettit, D.

2020-04-13 immunology 10.1101/2020.04.12.024844 medRxiv

Top 0.1%

74.6%

Show abstract

We demonstrate the use of a Generative Adversarial Network (GAN), trained from a set of over 400,000 light and heavy chain human antibody sequences, to learn the rules of human antibody formation. The resulting model surpasses common in silico techniques by capturing residue diversity throughout the variable region, and is capable of generating extremely large, diverse libraries of novel antibodies that mimic somatically hypermutated human repertoire response. This method permits us to rationally design de novo humanoid antibody libraries with explicit control over various properties of our discovery library. Through transfer learning, we are able to bias the GAN to generate molecules with key properties of interest such as improved stability and developability, lower predicted MHC Class II binding, and specific complementarity-determining region (CDR) characteristics. These approaches also provide a mechanism to better study the complex relationships between antibody sequence and molecular behavior, both in vitro and in vivo. We validate our method by successfully expressing a proof-of-concept library of nearly 100,000 GAN-generated antibodies via phage display. We present the sequences and homology-model structures of example generated antibodies expressed in stable CHO pools and evaluated across multiple biophysical properties. The creation of discovery libraries using our in silico approach allows for the control of pharmaceutical properties such that these therapeutic antibodies can provide a more rapid and cost-effective response to biological threats.

5

Computational design of developable therapeutic antibodies: efficient traversal of binder landscapes and rescue of escape mutations

Dreyer, F. A.; Schneider, C.; Kovaltsuk, A.; Cutting, D.; Byrne, M. J.; Nissley, D. A.; Wahome, N.; Kenlay, H.; Marks, C.; Errington, D.; Gildea, R. J.; Damerell, D.; Tizei, P.; Bunjobpol, W.; Darby, J. F.; Drulyte, I.; Hurdiss, D. L.; Surade, S.; Pires, D. E. V.; Deane, C. M.

2024-10-04 bioinformatics 10.1101/2024.10.03.616038 medRxiv

Top 0.1%

64.1%

Show abstract

Developing therapeutic antibodies is a challenging endeavour, often requiring large-scale screening to produce initial binders, that still often require optimisation for developability. We present a computational pipeline for the discovery and design of therapeutic antibody candidates, which incorporates physics- and AI-based methods for the generation, assessment, and validation of developable candidate antibodies against diverse epitopes, via efficient few-shot experimental screens. We demonstrate that these orthogonal methods can lead to promising designs. We evaluated our approach by experimentally testing a small number of candidates against multiple SARS-CoV-2 variants in three different tasks: (i) traversing sequence landscapes of binders, we identify highly sequence dissimilar antibodies that retain binding to the Wuhan strain, (ii) rescuing binding from escape mutations, we show up to 54% of designs gain binding affinity to a new subvariant and (iii) improving developability characteristics of antibodies while retaining binding properties. These results together demonstrate an end-to-end antibody design pipeline with applicability across a wide range of antibody design tasks. We experimentally characterised binding against different antigen targets, developability profiles, and cryo-EM structures of designed antibodies. Our work demonstrates how combined AI and physics computational methods improve productivity and viability of antibody designs.

6

Drug-like antibody design against challenging targets with atomic precision

Chai Discovery Team, ; Boitreaud, J.; Chen, R.; Dent, J.; Fairweather, L.; Geisz, D.; Greenig, M.; Boyd, N.; Jain, J.; Johnston, B.; McPartlon, M.; Meier, J.; Patil, N.; Qiao, Z.; Rollins, N.; Vicas, N.; Wollenhaupt, P.; Wu, K.; Yeung, A.

2025-12-01 molecular biology 10.1101/2025.11.29.691346 medRxiv

Top 0.1%

60.0%

Show abstract

Computational antibody design has seen rapid progress, with high success rates enabling direct translation to characterization without any high-throughput screening required. In this work, we markedly expand the scope of de novo antibody design by applying our state-of-the-art Chai-2 platform to design drug-like antibodies in full-length monoclonal format. We find that >86% of these full-length mAbs have strong developability profiles on par with therapeutic antibodies. We further show that experimentally determined structures of Chai-2 designs closely match their in silico predictions, demonstrating that Chai-2 produces atomically accurate models of designed antibodies. Building on these foundational capabilities, we showcase two potential applications of Chai-2 against different targets: designing functional antibodies mediating GPCR agonism, and highly specific antibodies selectively binding tumor-specific neoepitopes. Taken together, this work brings new flexibility to modern discovery pipelines, accelerating the path from in silico design to functional validation across both conventional and challenging targets. Beyond reducing the cost and timelines associated with large screening campaigns, in silico design can now open new frontiers for creative, targeted therapeutics that address unmet clinical needs.

7

KyDab - a comprehensive database of antibody discovery selection campaigns.

Zhou, Q.; Chomicz, D.; Melvin, D.; Griffiths, M.; Yahiya, S.; Reece, S.; Le Pannerer, M.-M.; Krawczyk, K.

2026-03-27 bioinformatics 10.64898/2026.03.25.713450 medRxiv

Top 0.1%

55.3%

Show abstract

Preclinical antibody discovery relies on progressive screening and down-selection of candidate antibodies from large immune repertoires, yet this critical process is poorly represented in existing public databases. Here we introduce KyDab (Kymouse Antibody Database), a well-curated database of antibody discovery selection data generated using standardized workflows on the Kymouse humanized mouse platform. The current release includes 11 Kymouse platform mice immunisation studies covering 51 immunogens, more than 120,000 paired heavy-light chain sequences, and binding measurements for a selected subset of experimentally characterized clones. By capturing full-funnel selection data with consistent metadata and both positive and negative experimental outcomes, KyDab provides a valuable data resource for the development and evaluation of artificial intelligence models for antibody discovery. KyDab is accessible https://kydab.naturalantibody.com, and the database will be continuously updated as new datasets become available.

8

LICHEN: Light-chain Immunoglobulin sequence generation Conditioned on the Heavy chain and Experimental Needs

Capel, H. L.; Ellmen, I.; Murray, C. J.; Mignone, G.; Black, M.; Clarke, B.; Breen, C.; Tierney, S.; Dougan, P.; Buick, R. J.; Greenshields-Watson, A.; Deane, C. M.

2025-08-07 bioinformatics 10.1101/2025.08.06.668938 medRxiv

Top 0.1%

54.8%

Show abstract

In developing therapeutic antibodies, the heavy chain is often prioritised due to its higher variability and its central role in antigen binding. An appropriate pairing of the light sequence is however important for antibody function. Here we present LICHEN, a heavy chain conditioned light sequence generation tool that enables collaborative light sequence design by leveraging computational capabilities alongside experimental expertise. LICHEN generates light sequences which are valid (antibodylike), diverse in sequence and structure, and conditioned on a specific heavy chain. LICHEN can also condition on germline and CDRs and automatically filter generated sequences for required properties. This allows LICHEN to be used across multiple antibody development use cases. We carry out experimental validation of the method conditioning only on the heavy sequence and on the heavy sequence and binding information. Our in vitro results show that sequences created by LICHEN have effective expression yields and can retain antigen-binding.

9

Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness

Bachas, S.; Rakocevic, G.; Spencer, D.; Sastry, A. V.; Haile, R.; Sutton, J. M.; Kasun, G.; Stachyra, A.; Gutierrez, J. M.; Yassine, E.; Medjo, B.; Blay, V.; Kohnert, C.; Stanton, J. T.; Brown, A.; Tijanic, N.; McCloskey, C.; Viazzo, R.; Consbruck, R.; Carter, H.; Levine, S.; Abdulhaqq, S.; Shaul, J.; Ventura, A. B.; Olson, R. S.; Yapici, E.; Meier, J.; McClain, S.; Weinstock, M.; Hannum, G.; Schwartz, A.; Gander, M.; Spreafico, R.

2022-08-17 bioinformatics 10.1101/2022.08.16.504181 medRxiv

Top 0.1%

53.2%

Show abstract

Traditional antibody optimization approaches involve screening a small subset of the available sequence space, often resulting in drug candidates with suboptimal binding affinity, developability or immunogenicity. Based on two distinct antibodies, we demonstrate that deep contextual language models trained on high-throughput affinity data can quantitatively predict binding of unseen antibody sequence variants. These variants span a KD range of three orders of magnitude over a large mutational space. Our models reveal strong epistatic effects, which highlight the need for intelligent screening approaches. In addition, we introduce the modeling of "naturalness", a metric that scores antibody variants for similarity to natural immunoglobulins. We show that naturalness is associated with measures of drug developability and immunogenicity, and that it can be optimized alongside binding affinity using a genetic algorithm. This approach promises to accelerate and improve antibody engineering, and may increase the success rate in developing novel antibody and related drug candidates.

10

Benchmarking Generative Large Language Models for de novo Antibody Design and Agentic Evaluation

Hossain, D.; Abir, F. A.; Zhang, S.; Chen, J. Y.

2026-04-21 bioinformatics 10.64898/2026.04.18.716776 medRxiv

Top 0.1%

52.8%

Show abstract

Despite major advances in computational antibody engineering, no systematic comparison of modern open-source LLM backbone families for antibody sequence generation exists, nor is it known whether architectural differences matter at compact model scales. In this study, five compact transformer variants inspired by prominent open-source LLM families (Llama-4, Gemma-3, DeepSeek-V3, Mistral 7B, and NVIDIA Nemotron-3) were customized and trained from scratch for de novo VH single-domain antibody (sdAb) design. All five models were pretrained from scratch on 15 million sequences from the Observed Antibody Space (OAS) database. Pretraining yielded uniformly high generative fidelity across architectures: sequence diversity 0.507-0.516 (CV=0.8%), uniqueness approaching 1.0, and novelty 0.925-0.977 (CV=2.2%). The models were subsequently fine-tuned on disease-stratified repertoires spanning SARS-CoV-2 (n=4,688), HIV (n=430), HER2 (n=22,778), and Ebola virus (n=2,868). Structural assessment of top-ranked candidates of those case studies via AlphaFold-2, Boltz-2, RoseTTAFold-2, and ESMFold produced mean pLDDT scores of 92.88{+/-}1.54 to 93.77{+/-}2.16, with no statistically significant inter-model differences (Kruskal-Wallis H=2.06, p>0.05; N=100), indicating no statistically detectable difference was observed across architectures at this compressed scale in a single-seed experiment, suggesting that generative capacity at this parameter regime is primarily determined by training data and model scale rather than family-specific design elements at this scale. Computational docking yielded predicted binding free energies of -36.34 to -65.60 kcal/mol; independent biological rigor validation through IMGT-defined CDR-H3 extraction, BLASTp novelty assessment, and NetMHCIIpan 4.3 MHC-II immunogenicity profiling collectively confirmed antigen-binding loop novelty (CDR-H3 identity 0-29% to closest database hits), germline-consistent humanness (77-90% VH germline content), and immunogenically silent antigen-binding surfaces with no strong MHC-II binders detected across CDR regions in any candidate. We further introduce a proof-of-concept agentic evaluation pipeline leveraging the Model Context Protocol (MCP) with Claude Sonnet 4.6, enabling automated structural profiling and candidate prioritization across disease targets.

11

solPredict: Antibody apparent solubility prediction from sequence by transfer learning

Feng, J.; Jiang, M.; Shih, J.; Chai, Q.

2021-12-09 bioinformatics 10.1101/2021.12.07.471655 medRxiv

Top 0.1%

52.5%

Show abstract

There is growing interest in developing therapeutic mAbs for the route of subcutaneous administration for several reasons, including patient convenience and compliance. This requires identifying mAbs with superior solubility that are amenable for high-concentration formulation development. However, early selection of developable antibodies with optimal high-concentration attributes remains challenging. Since experimental screening is often material and labor intensive, there is significant interest in developing robust in silico tools capable of screening thousands of molecules based on sequence information alone. In this paper, we present a strategy applying protein language modeling, named solPredict, to predict the apparent solubility of mAbs in histidine (pH 6.0) buffer condition. solPredict inputs embeddings extracted from pretrained protein language model from single sequences into a shallow neutral network. A dataset of 220 diverse, in-house mAbs, with extrapolated protein solubility data obtained from PEG-induced precipitation method, were used for model training and hyperparameter tuning through five-fold cross validation. An independent test set of 40 mAbs were used for model evaluation. solPredict achieves high correlation with experimental data (Spearman correlation coefficient = 0.86, Pearson correlation coefficient = 0.84, R2 = 0.69, and RMSE = 4.40). The output from solPredict directly corresponds to experimental solubility measurements (PEG %) and enables quantitative interpretation of results. This approach eliminates the need of 3D structure modeling of mAbs, descriptor computation, and expert-crafted input features. The minimal computational expense of solPredict enables rapid, large-scale, and high-throughput screening of mAbs during early antibody discovery.

12

Prediction of Antibody Non-Specificity using Protein Language Models and Biophysical Parameters

Sakhnini, L. I.; Beltrame, L.; Fulle, S.; Sormanni, P.; Henriksen, A.; Lorenzen, N.; Vendruscolo, M.; Granata, D.

2025-05-01 bioinformatics 10.1101/2025.04.28.650927 medRxiv

Top 0.1%

52.4%

Show abstract

The development of therapeutic antibodies requires optimizing target binding affinity and pharmacodynamics, while ensuring high developability potential, including minimizing non-specific binding. In this study, we address this problem by predicting antibody non-specificity by two complementary approaches: (i) antibody sequence embeddings by protein language models (PLMs), and (ii) a comprehensive set of sequence-based biophysical descriptors. These models were trained on human and mouse antibody data from Boughter et al. (2020) and tested on three public datasets: Jain et al. (2017), Shehata et al. (2019) and Harvey et al. (2022). We show that non-specificity is best predicted from the heavy variable domain and heavy-chain complementary variable regions (CDRs). The top performing PLM, a heavy variable domain-based ESM 1v LogisticReg model, resulted in 10-fold cross-validation accuracy of up to 71%. Our biophysical descriptor-based analysis identified the isoelectric point as a key driver of non-specificity. Our findings underscore the importance of biophysical properties in predicting antibody non-specificity and highlight the potential of protein language models for the development of antibody-based therapeutics. To illustrate the use of our approach in the development of lead candidates with high developability potential, we show that it can be extended to therapeutic antibodies and nanobodies.

13

HyperBind2: Multi-Shot Learning Enables Progressive Improvement in Computational Antibody Discovery

Dell'uomo, D.; Satz, A.; Averso, B.

2025-11-06 bioengineering 10.1101/2025.11.06.687005 medRxiv

Top 0.1%

50.3%

Show abstract

Antibody discovery remains constrained by resource-intensive experimental screening approaches that offer limited control over critical properties. Here we present HyperBind2, a machine learning platform that progressively improves antibody-antigen interaction predictions through experimental feedback cycles. Unlike static or zero-shot computational approaches, HyperBind2 employs multi-shot learning, which adapts to target-specific patterns using minimal experimental data (10-20 validated binders/non-binders). The platform requires only the targets primary sequences as input (no experimental structures are required), embedding antibody and antigen sequences into a shared representation space where binding affinity is modeled as a learned geometric relationship. HyperBind2 was validated at multiple independent academic and commercial labs. One such experiment validated HyperBind2 on a challenging multi-pass membrane receptor target through three iterative design-test cycles. Starting from an initial screening of 100 million candidates completed within 48 hours, model accuracy improved from 65% to 85% across three rounds of lab-to-AI feedback. By round 3, HyperBind2 achieved a 21% experimental success rate, with 20 of 96 tested candidates demonstrating KD [≤] 100 nM, including 3 with sub-10 nM affinities. HyperBind2 spans multiple therapeutic formats including scFvs, VHHs, and full-length IgGs, with preliminary research extending to CAR-T, BiTE, and bispecific formats. HyperBind2 establishes an efficient digital-experimental workflow that reduces laboratory resources and screening time while maintaining high hit rates. By combining massive computational pre-screening with targeted experimental validation and continuous model refinement, HyperBind2 significantly reduces experimental burden while accelerating the identification of therapeutic quality antibody candidates. HyperBind2 is available via open-source for academic research or a commercial platform (abtique.com), which provides lab-ready antibody sequences with no coding required. DisclaimerDue to the sensitive nature of intellectual property and confidentiality agreements, target identities have been anonymized where research is ongoing or proprietary. While we provide detailed descriptions of experimental methods, all details cannot be disclosed. The results have been independently validated by third parties, enhancing their reliability.

14

OpenGerminal: an open-source implementation of the Germinal antibody design pipeline

Han, B.; Li, S.

2026-06-29 bioinformatics 10.64898/2026.06.25.734527 medRxiv

Top 0.1%

44.7%

Show abstract

Germinal is a recently described computational pipeline for de novo antibody design that combines AlphaFold-Multimer hallucination with antibody language model guidance to generate epitope-targeted antibodies. Germinal identified binders with nanomolar-to-low-micromolar affinities by testing only 43-101 designs per target across four diverse antigens, establishing it as a practical tool for epitope-directed antibody design accessible to standard academic laboratories. As this architecture is itself very recent, systematic replacement and benchmarking of its individual components remains largely unexplored, yet offers a valuable opportunity to probe the robustness of the underlying design. We present OpenGerminal, which replaces PyRosetta with a fully open-source stack comprising OpenMM 8.5.1, FreeSASA, FASPR, Biopython, and sc-rs v1.0.0, and adopts AbLang1 (ablang2 v0.2.1) as the sole antibody language model in place of IgLM. Benchmarking on two VHH targets (PD-L1 and IL-3) reveals that OpenGerminal achieves a markedly higher cofolding pass rate (PD-L1: 33.7% vs. 18.6%; IL-3: 24.6% vs. 8.0%) with equivalent or improved Chai-1 structural confidence metrics in accepted designs, at the cost of a modest increase in per-trajectory computation time (>=1.5x). Multi-chain target support is also extended and verified to run without error on the official insulin example. OpenGerminal provides the first systematic benchmarking of IgLM versus AbLang1 within the Germinal architecture, and its fully open-source component stack broadens the range of deployment contexts in which the pipeline can be used.

15

Predicting the Purity of Multispecific Antibodies From Sequence Using Machine Learning: Methods and Applications

Mazurek, A. S.; Davis, A.; Tsang, K.; Rivera, J.; Huang, Z.-F.; Holt, J.; Comeau, S. R.; Kumar, S.; Kasturirangan, S.

2023-12-07 molecular biology 10.1101/2023.12.05.570217 medRxiv

Top 0.1%

40.9%

Show abstract

Multispecific antibodies are prominent therapeutic agents, but many molecular formats and drug candidates that show promise during molecular discovery stages cannot be scaled up and developed into drugs due to inadequate developability. During the discovery stages, the selection of molecule format(s), molecule design, purity, and initial physiochemical stability testing criteria largely rely on scientists experience. Machine learning, however, can identify hidden trends in large datasets, aiding in the selection of drug candidates with improved developability. In this study, we present a machine learning approach to predict antibody purity, measured by the percentage of monomer after protein A purification. Using the amino acid sequences of variable regions, molecular formats, germlines and germline pairings, and calculated physiochemical properties as inputs, machine learning models were trained to predict the percentage of monomer for a given multispecific antibody (Figure 1). The dataset employed in this study consists of [~]500 multi-specific antibodies generated during BIs internal drug discovery programs. Our results indicate that machine learning, when applied to sequence, germline, and format data, can effectively predict antibody percentage of monomer. Incorporating this approach into high-throughput multispecific antibody screening processes can save time and resources by reducing the need to test a large subset of potentially unstable antibodies. While this study focused on percentage of monomer as a test case, similar approaches can be employed to predict other antibody properties, such as melting temperature (Tm), hydrophobicity (aHIC), and solution stability properties (AC-SINS). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=47 SRC="FIGDIR/small/570217v1_fig1.gif" ALT="Figure 1"> View larger version (11K): org.highwire.dtl.DTLVardef@f23246org.highwire.dtl.DTLVardef@c2b81aorg.highwire.dtl.DTLVardef@1c4e59dorg.highwire.dtl.DTLVardef@1bed683_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 1C_FLOATNO Overview of ML model for predicting multispecific antibody purity from sequence, germline and format information. C_FIG

16

Antibody immunogenicity prediction and optimization with ImmunoSeq

Huang, Q.; He, Y.; Liu, K.

2025-08-19 immunology 10.1101/2025.08.14.670305 medRxiv

Top 0.1%

40.1%

Show abstract

Therapeutic antibody development faces persistent immunogenicity challenges from anti-drug antibodies (ADA). Identifying peptide fragments presented by major histocompatibility complex is the central challenge in predicting immunogenicity. Here, we presented ImmunoSeq, an interpretable and applicable method for immunogenicity prediction. ImmunoSeq addresses this by deploying complementary k-mer (k=8-12) peptide libraries: a positive library of immunologically safe peptides from fragmented human proteins/antibodies, and a negative library of murine antibody fragments capturing evolutionary-selected immunogenic triggers. For candidate antibodies, we generate all possible k-mer peptides and compute hit rate by summing positive hits (+1.0) and negative hits (-0.2 penalty) normalized against total peptide number. Higher hit rate predicts lower ADA risk, with residue-level resolution enabling precise localization of immunogenic hotspots. ImmunoSeq demonstrated superior ADA correlation and humanness classification accuracy compared to deep learning models, while accurately predicts ADA reductions in humanization, enabling sufficient sequence optimization for humanness. By leveraging dual-library discrimination principles of self/non-self-peptide, ImmunoSeq provides a robust, interpretable solution for immunogenicity prediction and sequence optimization.

17

Baselining the Buzz. Trastuzumab-HER2 Affinity, and Beyond!

Chinery, L.; Hummer, A. M.; Mehta, B. B.; Akbar, R.; Rawat, P.; Slabodkin, A.; Le Quy, K.; Lund-Johansen, F.; Greiff, V.; Jeliazkov, J. R.; Deane, C. M.

2024-03-29 bioinformatics 10.1101/2024.03.26.586756 medRxiv

Top 0.1%

39.7%

Show abstract

Strong antibody-antigen binding is the primary consideration when developing an efficacious therapeutic antibody. In recent years, much work has been devoted to applying complex machine learning models to this cause, yet simple baselines are often lacking. Here, we show that the widely used sequence alignment method, BLOSUM, can yield diverse, binder-enriched libraries from a single starting antibody. Using Trastuzumab-HER2 as a model system, we experimentally validated 720 novel designs generated with five different computational methods using surface plasmon resonance. The BLOSUM substitution matrix outperformed all four deep learning design approaches tested, achieving an estimated minimum binder enrichment of 12.5% and producing nine sub-nanomolar binders. These results underscore the importance of comparing against simple baselines and set a benchmark to guide future computational antibody library design. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=141 SRC="FIGDIR/small/586756v2_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@1597ee9org.highwire.dtl.DTLVardef@9af4b6org.highwire.dtl.DTLVardef@1380e61org.highwire.dtl.DTLVardef@1380d29_HPS_FORMAT_FIGEXP M_FIG C_FIG

18

Benchmarking AI-Driven PTIm-mAb Across Eleven FDA-Approved Bispecific Antibodies: A Cross-Tool Validation Study

Addepalli, M. K.; Prattipati, M.

2026-07-10 bioinformatics 10.64898/2026.07.07.736933 medRxiv

Top 0.1%

39.5%

Show abstract

BackgroundLate-stage attrition in therapeutic antibody discovery is dominated by developability liabilities: aggregation, polyspecificity, charge-driven non-specific binding, and chain-mispairing artefacts. Bispecific antibodies amplify these risks because each additional binding arm adds a new biophysical envelope that must be jointly satisfied. The existing in-silico ecosystem addresses individual axes of this problem (humanization, structure prediction, single-metric developability scoring) but few platforms integrate them end-to-end. PTIm-mAb (SANSHI Bio Solutions Pvt Ltd) is a multi-objective, AI/ML-driven antibody design platform that jointly optimizes sequence liabilities, surface aggregation, charge balance, humanness, and predicted binding affinity, and recommends a bispecific architecture in a single workflow. MethodsWe applied PTIm-mAb to the published sequences of eleven FDA-approved bispecific antibodies using the platforms default-parameter Pareto-acceptance optimization loop, run to convergence or to the internal iteration ceiling, with no human curation between the platform run and the external profiler. Both wild-type and platform-optimized sequences were profiled independently with three publicly available developability tools: Aggrescan, CamSol, and the Therapeutic Antibody Profiler (TAP). Paired-sample tests (Wilcoxon signed-rank, exact binomial sign test, McNemar exact test) evaluated the direction and significance of changes. ResultsAcross the 17 evaluable paired arms profiled by TAP, PTIm-mAb cleared four wild-type CDR-vicinity Positive Charge Patch (PPC) flags Blinatumomab-Arm1 (1.9952 [->] 0.6885), Mosunetuzumab-Arm1 (1.3391 [->] 0.0568), Linvoseltamab-Arm2 (0.8060 [->] 0.0), and the headline Elranatamab-Arm1 case (1.7981 [->] 0.5799) achieved without trading off any other in-range metric and corroborated by Aggrescan and CamSol on the same arm. Total CDR length was significantly shortened across the cohort (Wilcoxon two-sided p = 0.0075, one-sided p = 0.0037, effect size r = 0.65): significant improvement on the metric most directly under the optimizers control. The directional shift on Aggrescan integrated aggregation propensity was also significant by sign test (24 of 36 chains improved, 2 unchanged, 10 worsened; p = 0.021). On the already-clean Zenocutuzumab profile the optimizer identified residual headroom (PPC 0.1191 [->] 0.0; SFvCSP 12.5 [->] 6.0), demonstrating that the platforms value extends to candidates that pass all flags. Three results: Teclistamab Arm-1, Emicizumab, and Talquetamab Arm-2 did not clear all flags and are presented as candidates for iterative re-invocation of the platform pipeline on the optimized output (planned follow-up; Section 5). The remaining TAP metrics (PSH, PPC magnitude, PNC, |SFvCSP|) trended in the improvement direction without reaching significance in this cohort, a pattern consistent with the expected statistical signature of a multi-objective optimizer applied to molecules already within the clinical-stage envelope. The platform reported a mean of 12.8 months and USD 723,889 of computational front-loading per project across the nine-project cohort (range 9.0-16.0 months; USD 510,000-960,000); the underlying cost assumptions are tabulated in Supplementary Table S3. ConclusionPTIm-mAb produces externally verifiable, literature-aligned improvements on the metrics most directly under its control, clears CDR-vicinity charge-patch flags on a meaningful fraction of flagged candidates, and front-loads substantial design-iteration work. The cohort-level pattern is consistent with a calibrated multi-objective optimizer operating at the edge of detectable headroom on a deliberately hard benchmark. We position the platform as an early-stage triage and lead-optimization layer in bispecific antibody discovery. For molecules whose first-pass result does not clear all flags, iterative re-invocation of the pipeline on the optimized output is a natural follow-up direction.

19

Structure-guided computational design and mechanistic understanding of the p95HER2-targeting NAZ-mAb antibody and its variants

Rawat, P.; Kyte, J. A.; Greiff, V.; Dorraji, E.

2026-07-11 bioinformatics 10.64898/2026.07.07.736817 medRxiv

Top 0.1%

39.2%

Show abstract

Human epidermal growth factor receptor 2 (HER2) is an oncogenic receptor tyrosine kinase in breast cancer and other malignancies. A subset of HER2-positive tumours expresses 611-CTF-p95HER2, a tumour-specific, hyperactive truncated isoform associated with metastasis and treatment resistance that lacks most of the extracellular domain targeted by conventional HER2-directed antibodies. We previously developed NAZ-mAb (formerly known as Oslo-2), a monoclonal antibody against 611-CTF-p95HER2. Here, we describe a computational antibody-engineering workflow for designing variants of NAZ-mAb. Starting from the sequence alone, we modeled the NAZ-mAb-611-CTF-p95HER2 complex, generated a combinatorial mutational landscape using FoldX 5.0, and prioritized candidate variants using predicted interaction energy and developability criteria. Two variants representing distinct design strategies were selected for validation: an aromatic double mutant, NAZ-mAb v1 (L:S31W/L:H107W), and a conservative single mutant, NAZ-mAb v2 (L:S31M). Both variants were successfully expressed as recombinant IgGs; NAZ-mAb v2 achieved a five-fold higher recombinant expression yield than parental NAZ-mAb, while both variants retained antigen binding with a higher apparent signal than the parental antibody in indirect ELISA. However, Biacore two-state kinetic analysis revealed weaker affinities than the parental antibody (KD NAZ-mAb v1: 32.6 nM, NAZ-mAb v2: 9.45 nM vs. parental NAZ-mAb: 5.33 nM). These findings show that the computational workflow can generate experimentally tractable, antigen-engaging NAZ-mAb variants, while also highlighting the limitations of fixed-backbone interaction-energy ranking as a predictor of binding affinity and yield. This study provides a practical framework for computationally driven, developability-aware antibody optimization in the absence of experimental structural data.

20

Machine Learning enables efficient and effective affinity maturation of nanobodies

Paul, S. B.; Harvey, E. P.; Osei-Owusu, J.; Kollasch, A. W.; Riesselman, A. J.; McMahon, C.; Gazizov, A.; Anuganti, M.; Belay, F.; Kieu, M. A.; Zhu, H.; Hollingsworth, L. R.; Harper, J. W.; Moshinsky, D. J.; Teixeira, A. R. R.; Marks, D. S.; Kruse, A. C.

2026-01-12 bioinformatics 10.64898/2026.01.11.698911 medRxiv

Top 0.1%

38.5%

Show abstract

Antibodies can bind their targets with exquisite potency and selectivity due in part to large antibody-target protein-protein interaction surface areas. Despite the very large size and diversity of synthetic libraries, in vitro sorting alone tends to yield binders with modest affinities. By analogy to the in vivo affinity maturation in the natural immune system, these initial hits are typically affinity matured in vitro to achieve high affinity binding. However, affinity maturation campaigns can be laborious, often requiring multiple selection rounds and strategies for each clone to be optimized. Here, we investigated whether one could accelerate the discovery of optimized binders using machine learning on sequencing data from single selection sorts of affinity maturation yeast-display campaigns. Our results show that sparse sequencing data from a single sorting round can predict sequences that are enriched after multiple rounds. We also find that linear models outperform deep neural networks and semi-supervised approaches in ranking validated affinity-enhancing substitutions. Linear models are also more interpretable, offering insights into residue preferences that can be leveraged for further engineering. We use our models to design and select optimized nanobody binders to relaxin family peptide receptor 1 (RXFP1), yielding multiple improved binders including 3 sub nanomolar binders with the best exhibiting a [~]2500-fold improvement over WT.