mAbs — Latest Matching Preprints

1

OpenGerminal: an open-source implementation of the Germinal antibody design pipeline

Han, B.; Li, S.

2026-06-29 bioinformatics 10.64898/2026.06.25.734527 medRxiv

Top 0.1%

44.7%

Show abstract

Germinal is a recently described computational pipeline for de novo antibody design that combines AlphaFold-Multimer hallucination with antibody language model guidance to generate epitope-targeted antibodies. Germinal identified binders with nanomolar-to-low-micromolar affinities by testing only 43-101 designs per target across four diverse antigens, establishing it as a practical tool for epitope-directed antibody design accessible to standard academic laboratories. As this architecture is itself very recent, systematic replacement and benchmarking of its individual components remains largely unexplored, yet offers a valuable opportunity to probe the robustness of the underlying design. We present OpenGerminal, which replaces PyRosetta with a fully open-source stack comprising OpenMM 8.5.1, FreeSASA, FASPR, Biopython, and sc-rs v1.0.0, and adopts AbLang1 (ablang2 v0.2.1) as the sole antibody language model in place of IgLM. Benchmarking on two VHH targets (PD-L1 and IL-3) reveals that OpenGerminal achieves a markedly higher cofolding pass rate (PD-L1: 33.7% vs. 18.6%; IL-3: 24.6% vs. 8.0%) with equivalent or improved Chai-1 structural confidence metrics in accepted designs, at the cost of a modest increase in per-trajectory computation time (>=1.5x). Multi-chain target support is also extended and verified to run without error on the official insulin example. OpenGerminal provides the first systematic benchmarking of IgLM versus AbLang1 within the Germinal architecture, and its fully open-source component stack broadens the range of deployment contexts in which the pipeline can be used.

2

Benchmarking AI-Driven PTIm-mAb Across Eleven FDA-Approved Bispecific Antibodies: A Cross-Tool Validation Study

Addepalli, M. K.; Prattipati, M.

2026-07-10 bioinformatics 10.64898/2026.07.07.736933 medRxiv

Top 0.1%

39.5%

Show abstract

BackgroundLate-stage attrition in therapeutic antibody discovery is dominated by developability liabilities: aggregation, polyspecificity, charge-driven non-specific binding, and chain-mispairing artefacts. Bispecific antibodies amplify these risks because each additional binding arm adds a new biophysical envelope that must be jointly satisfied. The existing in-silico ecosystem addresses individual axes of this problem (humanization, structure prediction, single-metric developability scoring) but few platforms integrate them end-to-end. PTIm-mAb (SANSHI Bio Solutions Pvt Ltd) is a multi-objective, AI/ML-driven antibody design platform that jointly optimizes sequence liabilities, surface aggregation, charge balance, humanness, and predicted binding affinity, and recommends a bispecific architecture in a single workflow. MethodsWe applied PTIm-mAb to the published sequences of eleven FDA-approved bispecific antibodies using the platforms default-parameter Pareto-acceptance optimization loop, run to convergence or to the internal iteration ceiling, with no human curation between the platform run and the external profiler. Both wild-type and platform-optimized sequences were profiled independently with three publicly available developability tools: Aggrescan, CamSol, and the Therapeutic Antibody Profiler (TAP). Paired-sample tests (Wilcoxon signed-rank, exact binomial sign test, McNemar exact test) evaluated the direction and significance of changes. ResultsAcross the 17 evaluable paired arms profiled by TAP, PTIm-mAb cleared four wild-type CDR-vicinity Positive Charge Patch (PPC) flags Blinatumomab-Arm1 (1.9952 [->] 0.6885), Mosunetuzumab-Arm1 (1.3391 [->] 0.0568), Linvoseltamab-Arm2 (0.8060 [->] 0.0), and the headline Elranatamab-Arm1 case (1.7981 [->] 0.5799) achieved without trading off any other in-range metric and corroborated by Aggrescan and CamSol on the same arm. Total CDR length was significantly shortened across the cohort (Wilcoxon two-sided p = 0.0075, one-sided p = 0.0037, effect size r = 0.65): significant improvement on the metric most directly under the optimizers control. The directional shift on Aggrescan integrated aggregation propensity was also significant by sign test (24 of 36 chains improved, 2 unchanged, 10 worsened; p = 0.021). On the already-clean Zenocutuzumab profile the optimizer identified residual headroom (PPC 0.1191 [->] 0.0; SFvCSP 12.5 [->] 6.0), demonstrating that the platforms value extends to candidates that pass all flags. Three results: Teclistamab Arm-1, Emicizumab, and Talquetamab Arm-2 did not clear all flags and are presented as candidates for iterative re-invocation of the platform pipeline on the optimized output (planned follow-up; Section 5). The remaining TAP metrics (PSH, PPC magnitude, PNC, |SFvCSP|) trended in the improvement direction without reaching significance in this cohort, a pattern consistent with the expected statistical signature of a multi-objective optimizer applied to molecules already within the clinical-stage envelope. The platform reported a mean of 12.8 months and USD 723,889 of computational front-loading per project across the nine-project cohort (range 9.0-16.0 months; USD 510,000-960,000); the underlying cost assumptions are tabulated in Supplementary Table S3. ConclusionPTIm-mAb produces externally verifiable, literature-aligned improvements on the metrics most directly under its control, clears CDR-vicinity charge-patch flags on a meaningful fraction of flagged candidates, and front-loads substantial design-iteration work. The cohort-level pattern is consistent with a calibrated multi-objective optimizer operating at the edge of detectable headroom on a deliberately hard benchmark. We position the platform as an early-stage triage and lead-optimization layer in bispecific antibody discovery. For molecules whose first-pass result does not clear all flags, iterative re-invocation of the pipeline on the optimized output is a natural follow-up direction.

3

Structure-guided computational design and mechanistic understanding of the p95HER2-targeting NAZ-mAb antibody and its variants

Rawat, P.; Kyte, J. A.; Greiff, V.; Dorraji, E.

2026-07-11 bioinformatics 10.64898/2026.07.07.736817 medRxiv

Top 0.1%

39.2%

Show abstract

Human epidermal growth factor receptor 2 (HER2) is an oncogenic receptor tyrosine kinase in breast cancer and other malignancies. A subset of HER2-positive tumours expresses 611-CTF-p95HER2, a tumour-specific, hyperactive truncated isoform associated with metastasis and treatment resistance that lacks most of the extracellular domain targeted by conventional HER2-directed antibodies. We previously developed NAZ-mAb (formerly known as Oslo-2), a monoclonal antibody against 611-CTF-p95HER2. Here, we describe a computational antibody-engineering workflow for designing variants of NAZ-mAb. Starting from the sequence alone, we modeled the NAZ-mAb-611-CTF-p95HER2 complex, generated a combinatorial mutational landscape using FoldX 5.0, and prioritized candidate variants using predicted interaction energy and developability criteria. Two variants representing distinct design strategies were selected for validation: an aromatic double mutant, NAZ-mAb v1 (L:S31W/L:H107W), and a conservative single mutant, NAZ-mAb v2 (L:S31M). Both variants were successfully expressed as recombinant IgGs; NAZ-mAb v2 achieved a five-fold higher recombinant expression yield than parental NAZ-mAb, while both variants retained antigen binding with a higher apparent signal than the parental antibody in indirect ELISA. However, Biacore two-state kinetic analysis revealed weaker affinities than the parental antibody (KD NAZ-mAb v1: 32.6 nM, NAZ-mAb v2: 9.45 nM vs. parental NAZ-mAb: 5.33 nM). These findings show that the computational workflow can generate experimentally tractable, antigen-engaging NAZ-mAb variants, while also highlighting the limitations of fixed-backbone interaction-energy ranking as a predictor of binding affinity and yield. This study provides a practical framework for computationally driven, developability-aware antibody optimization in the absence of experimental structural data.

4

Multi-Scale Machine Learning for Antibody-Antigen Binding Affinity Prediction Using Deep Mutational Scanning and Structural Features

Sivasubramani, S.

2026-06-23 bioinformatics 10.64898/2026.06.09.730151 medRxiv

Top 0.1%

38.4%

Show abstract

Predicting how mutations alter antibody-antigen binding affinity is essential for antibody engineering and vaccine design, yet current methods generalize poorly to unseen complexes. We present a multi-scale machine learning framework integrating 93 descriptors across four modalities: physicochemical, structural, ESM-2 protein language model, and solvent-accessible surface area (SASA)/{Delta}{Delta}Gfold features. Under leave-one-complex-out deep mutational scanning (LOCO-DMS) cross-validation on AbAgym (36,541 mutations, 68 experiments, 13 pathogens), gradient boosting achieved MCC = 0.206; a confidence-stratified ensemble reached MCC = 0.374 (83.5% accuracy, 25.5% coverage). No single modality exceeds the majority baseline alone; only multi-scale fusion succeeds. Boltzmann ceiling analysis shows 45.9% of mutations are near-neutral (|{Delta}{Delta}G| < kBT), bounding theoretical maximum MCC at 0.473; our method achieves 79.1% of this limit. Five deep learning architectures benchmarked under LOCO-DMS showed self-attention matching gradient boosting (MCC = 0.200). Cross-pathogen transfer failed systematically (mean 46.7%), confirming universal binding predictors remain an open challenge.

5

Aiki-GeNano: Multi-Stage Preference Optimization for Generative Design of Developable Nanobodies

Meda, R. S.; Doshi, J.; Iyer, E.; Shastry, S.; Mysore, V.

2026-05-01 bioinformatics 10.64898/2026.04.28.721526 medRxiv

Top 0.1%

38.3%

Show abstract

Therapeutic nanobodies must combine target binding with biophysical and chemical properties that determine manufacturability, stability, and clinical viability, collectively termed developability, yet most computational design pipelines still treat developability as a post-hoc filter rather than an integrated training objective. We present Aiki-GeNano, a three-stage language-model alignment pipeline for epitope-conditioned nanobody generation that integrates multiple developability signals directly into training, using only sequence information and previously published predictors. Across 65 target epitopes and relative to the supervised baseline, the combined pipeline raised predicted mean melting temperature by 6.6 {degrees}C, halved isomerization-motif severity, reduced deamidation, N-glycosylation sequons and CDR methionine-oxidation motifs, and preserved predicted humanness and solubility. On a shared 10-target GPCR benchmark, Aiki-GeNano achieved the highest predicted melting temperature and the lowest isomerization severity among five contemporary VHH generators. Starting from ProtGPT2 and a 1.35-million-pair binder dataset generated on an mRNA-display platform, the pipeline applies supervised fine-tuning, Direct Preference Optimization on 522,800 pairs ranked by a composite of selectivity, predicted thermal stability, solubility, and humanness, and Group Reward-Decoupled Policy Optimization against six sequence-based rewards (FR2 hydrophobicity, hydrophobic-patch coverage, chemical-liability motifs, Wilkinson-Harrison expression probability, VHH hallmark residues, scaffold integrity). Generated sequences differ from the nearest training sequence by a mean of 8.1-9.0 amino acids out of 126, and two alternative training trajectories converge to distinct amino-acid-composition strategies with similar liability outcomes but different thermal-stability gains, indicating initialization-dependent convergence of the reward-optimized policy. Predicted humanness was preserved at the level of the camelid VHH scaffold of the training library -- a data-side limitation rather than a methodological one, since the framework was effectively constant across all preference pairs. Applicability to the drug discovery and development pipeline, limitations of predicted-property evaluation, and future work are discussed.

6

SNAC-DB: An ML-Ready Database for Antibody and NANOBODY(R) VHH-Antigen Complexes with Expanded Structural Diversity and Real-World Benchmarking

Gupta, A.; Munoz Rivero, B.; Li, R.; Roel-Touris, J.; Fomekong Nanfack, Y.; Wendt, M.; Qiu, Y.; Furtmann, N.

2026-04-26 bioinformatics 10.64898/2026.04.22.720253 medRxiv

Top 0.1%

30.8%

Show abstract

Predicting antibody and NANOBODY(R) VHH-antigen complexes remains a critical challenge for state-of-the-art structure prediction models, limiting their impact in therapeutic discovery pipelines. We introduce SNAC-DB, an ML-ready database and curation pipeline enriched with structural biology expertise, designed to accelerate model accuracy and generalization by providing 31-37% expanded structural diversity over existing resources like SAbDab through comprehensive re-curation that extracts maximum value from available experimental structures. SNAC-DB expands coverage by capturing often-overlooked complexes and accurately identifying complete multi-chain epitopes through improved biological-assembly-based logic. Built for ML practitioners, SNAC-DB provides standardized formats with multi-threshold structure-based clustering to enable principled sample weighting during training. Using a rigorous benchmark of public PDB entries deposited post-May 2024 plus confidential therapeutic structures, we evaluate seven leading models (Protenix-v1, OpenFold-3p2, RosettaFold-3, Boltz-2, Boltz-1x, Chai-1, and AlphaFold2.3-multimer) with evaluation methodology tailored to antibody/NAN-OBODY(R) VHH-antigen complexes to ensure correct handling of multi-chain epitopes, revealing systematic performance gaps: success rates rarely exceed 25%, confidence-based ranking fails to identify best predictions even when accurate structures exist in ensembles, and all models consistently struggle with therapeutically relevant NANOBODY(R) VHHs. Systematic evaluation of sampling strategies demonstrates that while generating 1000 samples per target substantially increases the likelihood of producing accurate structures (oracle selection improves from 11.9% to 50.5%), confidence-based ranking remains nearly flat (between 10.9% and 14.9%), revealing that improved ranking mechanisms represent a more tractable path to performance gains. Finally, fine-tuning GeoDock on SNAC-DB yields higher success rates than training on SAbDab (11.0% vs. 7.1% for antibodies; 7.0% vs. 4.0% for NANOBODY(R) VHHs), suggesting that SNAC-DBs expanded structural diversity translates to improved model generalization. Significance StatementComputational antibody/NANOBODY(R) VHH design shows promise but remains unreliable for therapeutic development. SNAC-DB provides 31-37% expanded structural diversity through comprehensive data curation, immediately accelerating model development. Benchmarking seven leading AI models reveals accuracy rarely exceeds 25% on therapeutic targets, with confidence-based ranking failing to identify correct structures even when they exist in model outputs. Training on SNAC-DB increases prediction accuracy, validating that high-quality, diverse training data is critical for advancing computational methods toward clinical impact.

7

Beyond natural amino acids: Extending immunogenicity risk assessment to non-canonical peptide drugs through chemical feature encoding

Cairoli, M.; Nielsen, M.; Betts, C.; Obrezanova, O.; De Maria, L.

2026-05-26 bioinformatics 10.64898/2026.05.22.727138 medRxiv

Top 0.1%

30.7%

Show abstract

Peptide therapeutics are increasingly used to treat challenging diseases, but immunogenicity risks limit their clinical success. In silico tools enable immunogenicity screening through prediction of peptide-MHCII binding, yet current methods fail to capture chemical properties of non-natural amino acids routinely incorporated to improve drug properties. Here, we present a machine learning approach combining chemical fingerprints with sequence information to predict MHC class II binding for both canonical and modified peptides. We propose two molecular representations (direct-encoding and similarity-based chemical fingerprints) that preserve positional information while encoding chemical diversity. These representations achieved performance comparable to sequence-based encodings (BLOSUM62 and one-hot) for canonical peptides while accurately identifying binding cores and motifs. Testing on citrullinated peptides, chemical fingerprints substantially improved quantitative prediction accuracy while maintaining comparable linear correlation across encoding methods, demonstrating the importance of explicit chemical representation for accurate absolute binding affinity prediction. These descriptors can be integrated into pan-allele prediction frameworks, enabling immunogenicity risk assessment across diverse modifications and therapeutic modalities, including peptide therapeutics, antibody-drug conjugates, and synthetic vaccines. The proposed chemistry-informed framework addresses a critical gap in preclinical drug development, facilitating early mitigation strategies before costly clinical trials.

8

An approach for single-amino-acid resolution epitope mapping by kinetic affinity screening of antibody drugs against biosensor on-chip library of deep mutationally-scanned target variants

Agu, C. V.; Martelly, W.; Cook, R. L.; Gushgari, L. R.; Kesiraju, S.; Moreno, S.; Yapici, E.; Mohan, M.; Takulapalli, B.

2026-05-05 immunology 10.64898/2026.04.30.722015 medRxiv

Top 0.1%

22.1%

Show abstract

Epitope mapping is central to rational antibody drug design, affinity optimization and the anticipation of therapeutic resistance mechanisms. Here, we demonstrate the use of Sensor Integrated Proteome on Chip (SPOC) technology for single amino acid resolution epitope mapping. By generating high throughput (HTP) binding kinetics data, we identify important residues within the target epitope whose mutations alter drug-target interactions. The SPOC platform integrates simultaneous HTP cell-free production of folded proteins in nanowells from immobilized plasmid DNAs or linear expression cassettes and capture onto biosensor chips for subsequent label-free binding kinetic analysis using surface plasmon resonance (SPR). The model system comprised the extracellular domain (ECD) of CD20, a membrane-spanning 4-domain family protein, screened against its FDA-approved therapeutic monoclonal antibodies (thAbs) - rituximab and ocrelizumab. Using our proprietary POC protein nanofactory system, a partial deep mutationally scanned (DMS) CD20 ECD mutant library of 79 variants was produced on SPOC biosensor chips via rational single amino acid substitutions of the epitope and surrounding residues with alanine, aspartic acid, lysine, and serine, collectively representing four broad classes of amino acid side chain chemistries: nonpolar, acidic, basic, and polar neutral. The SPOC protein biosensor chip was then screened with both thAbs using SPOC SPR to generate kinetic affinity data, evaluate mutations that led to affinity loss or gain, and ultimately identify critical epitope residues that interface with the antibodies. Most mutations within the rituximab and ocrelizumab epitopes - EPANPSEK and YNCEPANPSEKNSPST, respectively - resulted in complete loss of binding or >25% increase in apparent KD. Notably, N171, P172, and S173 mutations, irrespective of side chain substitution, resulted in complete loss of rituximab binding while at least three diverse side chain substitutions at E168, P169, N171, P172, S173, E174, K175, and T180, led to complete loss of binding for ocrelizumab. These outcomes identify the listed residues as the most critical contact points for their respective antibodies. Interestingly, we also found that functional side-chain substitutions at some residues flanking the epitope increased affinity. This indicates that these non-epitope residues contribute to antibody contact, and that polarity at these sites is a tractable lever for affinity modulation by targeting the corresponding contact residues on the antibody CDRs. The proposed SPOC approach of screening drug candidates against on-chip library of mutationally-scanned therapeutic targets is relevant in the early phase of drug development to resolve epitopes at the residue-level to support more informed down-selection of candidates. It facilitates cost-effective improvement of thAbs, enhancing therapeutic efficacy across a wide array of therapeutic targets, including rare variants that might otherwise lead to therapeutic resistance.

9

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Ludwig, L.; Chungyoun, M.; Gray, J. J.

2026-06-11 bioinformatics 10.64898/2026.06.08.730660 medRxiv

Top 0.1%

22.0%

Show abstract

Antibodies are powerful therapeutics whose antigen specificity arises from sequence diversity shaped during development. Recently, language models trained on large antibody repertoire datasets have enabled the generation and screening of novel candidates, but these models retain a strong germline bias. As AI adoption increases in therapeutic workflows, it is crucial to develop models that harness the diversity of antibodies necessary for the discovery of mutations that encode desirable properties. Previous work explored the germline bias in masked antibody language models, yet the bias in generative autoregressive language models has not yet been addressed. Here, we present GermRL, a lightweight and modular reinforcement learning (RL) framework capable of alleviating the germline bias in pre-trained antibody autoregressive language models through group relative policy optimization (GRPO). GermRL achieves consistent one-shot generation of antibodies that satisfy specified mutation thresholds from germline while maintaining structural plausibility. Under the lowest and highest mutation thresholds tested (5 and 35 mutations from germline), GermRL scores 0.992 and 0.950 pass@1, respectively, compared to 0.398 and 0.034 for the pre-trained language model. Within GermRL, we introduce a key pair of modifications to GRPO that increase training efficiency by discouraging reward hacking under our antibody application. Furthermore, comparison of RL generated and natural antibody sequences reveals how RL based optimization can explore alternative evolutionary mutational patterns and residue compositional strategies while preserving key global properties of natural antibodies, including identifiable germline assignments, embedding-level similarity and comparable developability profiles. Thus, RL-trained generative models optimized to promote antibody mutations through diversity from germline provide a promising framework for navigating the antibody sequence landscape, enabling exploration of novel yet biologically plausible candidates for therapeutic design.

10

Decoding Bispecific Antibody Developability: Design Rules and Predictive Models from a 160-Member Library

Ritter, S.; Rand, L.; Karthick, S.; Bloomingdale, T.; Smith, A.; Ao, X.; Pierre, Y.; Harris, B.; Moller, J.; Bhatt, A.; Bhatt, R.; Schwartz, J.; Grippo, L.; Cohen, R.; Borhani, D. W.; Tessier, P. M.; Arsiwala, A.

2026-06-19 biophysics 10.64898/2026.06.15.732449 medRxiv

Top 0.1%

18.6%

Show abstract

Bispecific antibodies deliver functional outcomes that monospecific antibodies cannot, yet emergent self-association, polyreactivity, and aggregation often degrade their developability relative to their parental arms. Whether bispecific developability inherits from the parents or is driven by the format has not been tested at scale. We characterized 160 bispecific antibodies and their 65 parental arms on a uniform knobs-into-holes CrossMab IgG1 scaffold across 10 assays on the PROPHET-Ab high-throughput platform. Bispecific developability separates into three classes of inheritance. Hydrophobicity and surface charge inherit cleanly from the parents (Spearman {rho} {approx} 0.85 to 0.95), so parental-level screening predicts bispecific fate. Self-association and polyreactivity inherit partially ({rho} {approx} 0.60 to 0.88), with mechanistically interpretable emergent outliers driven in part by Fv-Fv charge complementarity and a parental biophysical ceiling on the hydrophobicity (HIC) by surface-charge (HAC) plane. Thermostability is poorly predicted from parental antibodies ({rho} < 0.4), so it requires bispecific-level testing. The class framework yields actionable selection rules: triage hydrophobicity and charge at the parental level, avoid pairing two high-HIC x high-HAC arms, pair opposite-sign Fv charges to suppress self-association but re-validate at the formulation buffer, and measure thermostability on the bispecific itself. This work charts a tractable path from monospecific sequence to bispecific developability prediction. SignificanceBispecific antibodies are a fast-growing therapeutic class, yet the rational design of well-behaving bispecific antibodies from validated monospecific antibody building blocks remains challenging. A key bottleneck is the lack of comprehensive, high-quality public datasets linking parental antibody developability properties to corresponding bispecific antibody developability properties. We address this gap by releasing a dataset comprising 160 bispecific antibodies and the 65 parental monospecific antibodies profiled in 10 developability assays. The data show that bispecific antibody developability is complex. Some properties are easily predictable from the parents, whereas others emerge in the bispecific format or from the bispecific format itself. The factors that govern each property can be identified empirically and used to make practical selection decisions. The mechanistic explanations and predictive models reported here establish a compact set of actionable rules. Together, they define a framework for using computational pipelines to convert monospecific antibodies into bispecific antibodies with drug-like developability properties, enabling faster and more effective generation of high-quality bispecific antibodies for diverse therapeutic applications.

11

Precision at Every Scale: Efficiency in AI-Driven De Novo Antibody Design

Cha, H.; Cho, K.; Gu, J.; Gwak, D.; Ham, S. W.; Hong, M.; Kim, S.; Kim, S.; Kwon, S.; Lee, C.; Lee, D. K.; Lee, D.; Lee, D.; Lim, J.; Noh, J.; Oh, S.; Park, E.; Park, S.; Park, T.; Ryu, E.; Ryu, S.; Sa, D. H.; Seok, C.; Sim, J.; Song, M. Y.; Won, J.; Woo, H.; Yang, J.

2026-05-15 bioengineering 10.1101/2025.11.21.689414 medRxiv

Top 0.1%

18.5%

Show abstract

The precise de novo design of antibodies remains a therapeutic challenge. The AI platform, GaluxDesign, was evaluated in a high-efficiency Precision-Scale Workflow by synthesizing and testing only 50 full-length IgG candidates per epitope across eight distinct epitopes from six therapeutic targets. This campaign yielded a 10.5% binder rate (estimated EC50 < 100 nM), identifying target-specific binders for seven of eight epitopes, with multiple candidates exhibiting sub-nanomolar to single-digit nanomolar dissociation constants (Kd). We further assessed the same workflow on nine shared benchmark targets selected for external comparison, where GaluxDesign identified target-specific binders for eight of nine targets, demonstrating strong target-level performance relative to previously reported de novo antibody design approaches. Together, these results establish a high-efficiency, precision-scale workflow for generating novel, high-affinity therapeutic antibodies.

12

Hybrid quantum-classical de novo design of MHC-binding peptides

Engdal, E. S.; Funk, J.; Bacarreza, O.; Machado, L.; Johansen, K. H.; Kemming, J.; Farnsworth, T.; Brasas, V.; Lefevre-Morand, R. Y. L.; Slysz, M.; Noerregaard, O. L.; Sandberg, O. A. D. A.; Makarovskiy, A.; Lodahl, P.; Acevedo-Rocha, C. G.; Kurowski, K.; Hadrup, S. R.; Clements, W. R.; Jenkins, T.

2026-07-10 biochemistry 10.64898/2026.07.09.736951 medRxiv

Top 0.1%

18.4%

Show abstract

Deep generative models have become a leading approach for designing therapeutic molecules, yet efficiently exploring vast biomolecular sequence spaces remains difficult, particularly for targets with limited training data. The prior distribution that seeds a generative model shapes which regions of sequence space it explores, and recent work suggests that non-classical distributions sampled from quantum processors can serve as a structured alternative to the factorised Gaussian priors used by default. Whether such priors help on complex biological design tasks has been largely untested. Here we present the first end-to-end hybrid quantum-classical pipeline for de novo design of MHC class I-binding peptides, coupling a generative adversarial network (GAN) to latent vectors sampled from a real photonic quantum processor. Tested in silico across 131 HLA alleles, quantum-derived priors increased the yield of predicted strong binders, with the largest relative gains for understudied alleles where classical baselines perform worst. We selected three understudied alleles for further evaluation, finding that large gains coincided with broader sequence exploration at non-anchor positions while anchor specificity was preserved. On these three alleles, we validated the designs in vitro using peptide-MHC stability ELISAs, confirming that quantum-designed peptides are potent stabilisers of peptide-MHC class I complexes. These results establish structured, hardware-realisable non-classical priors as a useful inductive bias for generative peptide design, with direct relevance to personalised immunotherapies and vaccines.

13

Nanobodies versus canonical antibodies: an updated comparison of their binding modes

Hauser, A.; Dangla-Pelissier, G.; Cazals, F.

2026-06-04 bioinformatics 10.64898/2026.06.01.729307 medRxiv

Top 0.1%

14.8%

Show abstract

Heavy-chain-only antibodies, produced by the adaptive immune systems of camelids and cartilaginous fish, complement canonical antibodies that contain variable domains from both heavy and light chains. We refine previous studies by providing a detailed analysis of the binding modes of VHHs versus canonical antibodies, using a dataset with a[~] 20-fold increase in the number of cases. We show that VHHs exhibit a larger buried surface area despite relying on a single variable domain than double domain antibodies. This property can be attributed to contributions from both framework regions and CDR3. We further demonstrate that the binding modes of VHHs, characterized by the number of FR and CDR regions contacting the antigen, are more diverse than previously reported. In addition, we find that VHH and canonical antibody interfaces display similar solvation properties, although VHH interfaces are more tightly packed. Finally, we discuss the thermodynamic and kinetic implications of these findings for the design of high-affinity VHHs, an issue of particular importance in protein engineering and design.

14

Mouse Fc-FcγRIV structure guides Fc engineering for cross-species FcγR recognition

Bajgain, Y.; Guo, M.; Hager, K. M.; Nguyen, A. W.; Zhang, Y.; Maynard, J. A.

2026-05-15 biochemistry 10.64898/2026.05.12.724433 medRxiv

Top 0.1%

13.0%

Show abstract

Antibody-dependent cellular cytotoxicity (ADCC) is a major mechanism of action for many FDA-approved therapeutic antibodies that is driven by interactions between the antibody Fc and Fc{gamma} receptors (Fc{gamma}Rs) on immune effector cells. Murine models used for preclinical antibody evaluation currently have limited predictive value for clinical ADCC performance due to interspecies differences in Fc-Fc{gamma}R interactions. The molecular determinants governing Fc-Fc{gamma}R engagement in mice remain poorly defined, complicating the interpretation of murine ADCC data and its clinical relevance. To address this, we present the high-resolution crystal structure of the receptor that regulates Fc-mediated cytotoxicity in mice, mouse Fc{gamma}RIV, alone and in complex with mouse IgG2a Fc. This complex preserves key features of the human IgG1 Fc-human Fc{gamma}RIIIa interface which mediates ADCC in humans including salt bridges, hydrogen bonds, and a proline sandwich. However, subtle variations in receptor orientation, Fc-Fc{gamma}R electrostatics, and glycan positions reduce human IgG1 Fc- mouse Fc{gamma}RIV binding affinity, resulting in species-restricted Fc-Fc{gamma}R mediated immune responses. Modeling of human IgG1 Fc interactions with mouse Fc{gamma}RIV predicted steric clashes, suggesting opportunities to modulate the interaction. One structure-guided substitution variant of human IgG1, Fchumo, maintains comparable human Fc{gamma}RIIIa engagement with enhanced binding to and activation of mouse Fc{gamma}RIV, relative to human IgG1 Fc. This study provides proof-of-concept for engineering human Fc domains for cross-species Fc{gamma}R recognition and provides a strategic framework to improve the predictive power of in vivo preclinical models.

15

Antibody-Antigen Affinity Prediction with Chain-Aware Protein Language Modeling

Singh, H.; Malhotra, A.; Srivastava, S. P.; SINGH, R. K.; Gorantla, R.

2026-06-21 bioinformatics 10.64898/2026.06.19.733375 medRxiv

Top 0.1%

12.6%

Show abstract

MotivationAntibody-antigen affinity determines which antibodies advance in therapeutic discovery, repertoire analysis and affinity maturation, but experimental measurements are sparse relative to the scale of sequence libraries. Structure-based predictors can exploit interface geometry when reliable complexes are available, yet early discovery often requires ranking many heavy-light chain pairs against antigens for which no complex structure exists. Existing sequence-based models are scalable, but frequently compress heavy and light chains into a single antibody representation or concatenate antibody and antigen features obscuring the chain-specific and epitope-specific signals that drive binding. ResultsWe present AbAffinity, a sequence-only chain-aware three-stream architecture that maintains heavy chain, light chain and antigen as distinct streams. It integrates frozen ESM-2 embeddings with heavy-chain CDR-focused pooling, heavy-light self-attention, adaptive fusion gating and gated cross-attention, training only a compact interaction module. On the SAAINT-DB benchmark, AbAffinity achieves strong predictive performance under ten-fold cross-validation and maintains robust accuracy on novel antigens. It consistently outperforms recent sequence-based models across external benchmarks including SAbDab, AB-Bind and SKEMPI 2.0. Ablation studies highlight the contributions of chain-specific representations, CDR-focused pooling and the gated interaction pathway. Integrated Gradients attributions recover known paratope and epitope residues at structurally validated interfaces. AbAffinity provides a lightweight, explainable sequence-first framework for antibody triage and prioritisation when structural information is limited or unavailable.

16

Discovery 4.0: An Escape-Aware Computational Platform for Resistance-Proofed Chimeric Antigen Receptor Design

Daneshvar, A.; Sharifnia, M.; Mashayekhi, R.

2026-05-27 cancer biology 10.64898/2026.05.24.727464 medRxiv

Top 0.1%

12.5%

Show abstract

Antigen escape is the dominant mechanism of therapeutic failure in chimeric antigen receptor (CAR) T cell and NK cell therapy, occurring in 30-60% of patients treated with single-target constructs. Existing discovery pipelines select epitopes and binders primarily on affinity metrics, neglecting evolutionary pressures that drive antigen editing, downregulation, isoform shifts, and glycosylation remodelling under sustained immunological selection. Here we describe Discovery 4.0, a five-layer computational engine developed at Pioneera Biosciences that encodes antigen escape resistance as a first-class engineering objective. Applied to four clinically validated hematologic antigens--CD19, CD20, CD22, and BCMA--Discovery 4.0 screened 20,000 synthetic binders in silico, designed 300+ CAR constructs, and validated [~]100 in co-incubation assays. The leading tri-specific construct achieved a 98.1% reduction in antigen escape relative to the best monospecific control, with an effective escape probability of 0.09%. Discovery 4.0 provides a generalizable, platform-scale framework for escape-resistant immunotherapy design applicable across oncological and autoimmune indications.

17

AGZArank: Investigating epitope-conditioned antibody binder ranking with structure-derived synthetic supervision

Sadykov, Z.; Khamidullina, A.; Sultankulov, B.; Seitkali, D.

2026-06-11 bioinformatics 10.64898/2026.06.08.730711 medRxiv

Top 0.1%

11.6%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWComputational antibody design methods can generate large libraries of candidate binders for a target epitope, but prioritizing which candidates to test experimentally remains a major bottleneck. Existing scoring approaches, including physics-based affinity estimators, structure-prediction-derived confidence measures, and inverse-folding likelihood models, provide useful proxy signals but are not explicitly optimized for early enrichment of binders among many structurally similar candidates. Here we investigate epitope-conditioned antibody binder ranking as a dedicated learning problem and introduce AGZArank, a geometric deep learning framework trained with structure-derived synthetic supervision based on normalized pseudo-energy targets. On a benchmark of 45 experimentally validated antibody-antigen interfaces, AGZArank recovered the true binder within the top ten candidates in 44.4% of cases and showed stronger generalization on post-2021 structures than ProteinMPNN, ESM-IF, and PRODIGY. Ablation experiments indicate that ranking performance depends primarily on training scale and alignment between the optimization objective and retrieval-based evaluation, rather than architectural complexity alone. These results support candidate prioritization as a distinct and tractable problem in computational antibody design.

18

Bioinf-Farma: supervised integration of epitope prediction and recombinant protein developability for automated vaccine candidate prioritization

Bondi, H.; Crespi, M.; Orlando, M.; Lescai, F.; Serapian, S. A.; Colombo, G.; Fasano, M.; Pollegioni, L.; Molla, G.

2026-06-18 bioinformatics 10.64898/2026.06.15.732271 medRxiv

Top 0.1%

10.8%

Show abstract

Vaccine antigen discovery requires prioritizing protein candidates according to both immunogenic potential and recombinant expression feasibility. These properties are typically evaluated using separate computational tools, requiring researchers to integrate heterogeneous outputs through ad hoc workflows. Here, we present BIOINF-farma, a modular platform integrating epitope prediction and developability assessment for rational antigen selection within a unified environment. Candidates can be submitted as amino acid sequences or three-dimensional structures. When experimental structures are unavailable, BIOINF-farma automatically searches for models in AlphaFold DB or performs structure prediction using Boltz-2, ensuring a standardized structural representation for downstream analyses. Antigenicity is quantified by combining structure-based conformational epitope signals (MLCE/REBELOT-BEPPE) and sequence-based linear epitope propensity scores (BepiPred 3.0) into a protein-level Antigenicity Score, with a classification threshold optimized on a manually curated validation dataset. Developability is evaluated through two supervised Random Forest meta-learners that integrate three solubility predictors (DeepSoluE, SoluProt, Protein-Sol) and three thermal stability predictors (TemStaPro, ProLaTherm, BertThermo), whose outputs are combined into an Expression Efficiency Score (EES). By integrating complementary predictive signals, the meta-learning framework achieves greater accuracy and robustness than individual predictors while maintaining performance across a broad range of sequence identities. The Antigenicity Score effectively discriminates antigenic from non-antigenic proteins with a large effect size, whereas EES successfully distinguishes soluble from insoluble outcomes on an independent panel of recombinant proteins expressed in Escherichia coli. BIOINF-farma jointly assesses antigenicity and expression feasibility within a single framework. Its modular architecture facilitates the incorporation of future predictive methods, while its web-based interface makes the full pipeline accessible to users without programming expertise, supporting rapid candidate triage in vaccine research and emerging pathogen responses. Author SummaryVaccine development begins with a critical step: identifying, among the many proteins encoded in a pathogen genome, those most suitable as candidate antigens. A promising candidate must satisfy two requirements that are rarely evaluated together. It must be recognized by the immune system, so that vaccination elicits a protective response; and it must be amenable to recombinant production, since antigens that cannot be obtained in sufficient quantity and quality are of limited practical use. Current computational tools typically address only one of these aspects, and researchers must integrate their outputs manually, through procedures that are time-consuming and prone to inconsistency. We developed BIOINF-farma, an automated platform that brings these two assessments into a single analytical framework. Starting from a protein sequence or an experimental structure, the platform retrieves or predicts a three-dimensional model, evaluates the proteins antigenic potential by combining complementary epitope predictors, and estimates its expression feasibility by integrating multiple solubility and stability predictors through supervised machine learning. A web-based interface makes the full workflow available to experimental immunologists and vaccine developers without requiring computational expertise, supporting rational candidate prioritization in routine vaccine research and during emerging pathogen responses.

19

Zero-Shot Design of a Biobetter Cetuximab: Enhanced EGFR Affinity with Preserved Developability

Weiner, I. N.

2026-05-08 bioengineering 10.64898/2026.05.05.722890 medRxiv

Top 0.1%

9.7%

Show abstract

Cetuximab is a chimeric IgG1 monoclonal antibody that has been a cornerstone therapy for EGFR-driven malignancies for nearly two decades. Its therapeutic activity is governed by competitive displacement of endogenous EGFR ligands, making binding affinity a direct determinant of clinical efficacy. We applied ConvergeAB, a target-aware antibody design platform, in a fully zero-shot configuration to generate a biobetter version of cetuximab. The lead Converge-designed antibody binds EGFR with a mean KD of 315 pM -- approximately 2.1-fold tighter than cetuximab (673 pM) and 4.4-fold tighter than a recently published, computationally designed anti-EGFR antibody from Cradle Bio (1.38 nM). The affinity gain arises from six substitutions that leave the global paratope architecture intact (C RMSD 0.15 [A] vs cetuximab) and instead optimize the binding interface through localized packing and electrostatic adjustments. A panel of biophysical and developability assays -- HIC, DLS, DSF, and PSR ELISA -- shows that the Converge variant matches or exceeds cetuximab on monomericity, monodispersity, polyspecificity, and thermal stability, while remaining within a developable hydrophobicity envelope. Together, these data demonstrate that a single zero-shot ConvergeAB campaign can deliver a biobetter molecule with significantly improved affinity and a clean developability profile, without compromising the parental antibodys drug-like properties.

20

Frozen Protein Foundation-Model Embeddings Improve Antibody-Antigen ΔΔG Ranking

Wang, R.; Jin, K.; Pan, L.

2026-07-14 bioinformatics 10.64898/2026.07.13.738250 medRxiv

Top 0.1%

9.7%

Show abstract

We investigate whether representations from AINN-P1--a protein foundation model trained autoregressively on tens of millions of natural protein sequences--transfer to the task of ranking antibody-antigen pairs by binding affinity. Casting affinity maturation as a learning-to-rank problem over the change in binding free energy ({Delta}{Delta}G), we compare a task-specific sequence model trained end-to-end from scratch against lightweight downstream heads built on top of frozen AINN-P1 embeddings, all evaluated under an identical five-fold cross-validation protocol. A regularized linear probe on the frozen embeddings already surpasses the from-scratch baseline, and an optimized lightweight head raises the mean Spearman rank correlation from 0.42 to 0.53--a relative improvement of approximately 28%-- while training in seconds and without any fine-tuning of the foundation model. Because a linear probe alone exceeds the fully trained end-to-end baseline, the gain is attributable to representation quality rather than to added downstream-model capacity. These results position frozen foundation-model embeddings as a strong, data-efficient default for affinity ranking in antibody engineering and establish a conservative lower bound that task-adaptive fine-tuning is expected to exceed.