mAbs — Latest Matching Preprints

1

OpenGerminal: an open-source implementation of the Germinal antibody design pipeline

Han, B.; Li, S.

2026-06-29 bioinformatics 10.64898/2026.06.25.734527 medRxiv

Top 0.1%

44.7%

Show abstract

Germinal is a recently described computational pipeline for de novo antibody design that combines AlphaFold-Multimer hallucination with antibody language model guidance to generate epitope-targeted antibodies. Germinal identified binders with nanomolar-to-low-micromolar affinities by testing only 43-101 designs per target across four diverse antigens, establishing it as a practical tool for epitope-directed antibody design accessible to standard academic laboratories. As this architecture is itself very recent, systematic replacement and benchmarking of its individual components remains largely unexplored, yet offers a valuable opportunity to probe the robustness of the underlying design. We present OpenGerminal, which replaces PyRosetta with a fully open-source stack comprising OpenMM 8.5.1, FreeSASA, FASPR, Biopython, and sc-rs v1.0.0, and adopts AbLang1 (ablang2 v0.2.1) as the sole antibody language model in place of IgLM. Benchmarking on two VHH targets (PD-L1 and IL-3) reveals that OpenGerminal achieves a markedly higher cofolding pass rate (PD-L1: 33.7% vs. 18.6%; IL-3: 24.6% vs. 8.0%) with equivalent or improved Chai-1 structural confidence metrics in accepted designs, at the cost of a modest increase in per-trajectory computation time (>=1.5x). Multi-chain target support is also extended and verified to run without error on the official insulin example. OpenGerminal provides the first systematic benchmarking of IgLM versus AbLang1 within the Germinal architecture, and its fully open-source component stack broadens the range of deployment contexts in which the pipeline can be used.

2

Benchmarking AI-Driven PTIm-mAb Across Eleven FDA-Approved Bispecific Antibodies: A Cross-Tool Validation Study

Addepalli, M. K.; Prattipati, M.

2026-07-10 bioinformatics 10.64898/2026.07.07.736933 medRxiv

Top 0.1%

39.5%

Show abstract

BackgroundLate-stage attrition in therapeutic antibody discovery is dominated by developability liabilities: aggregation, polyspecificity, charge-driven non-specific binding, and chain-mispairing artefacts. Bispecific antibodies amplify these risks because each additional binding arm adds a new biophysical envelope that must be jointly satisfied. The existing in-silico ecosystem addresses individual axes of this problem (humanization, structure prediction, single-metric developability scoring) but few platforms integrate them end-to-end. PTIm-mAb (SANSHI Bio Solutions Pvt Ltd) is a multi-objective, AI/ML-driven antibody design platform that jointly optimizes sequence liabilities, surface aggregation, charge balance, humanness, and predicted binding affinity, and recommends a bispecific architecture in a single workflow. MethodsWe applied PTIm-mAb to the published sequences of eleven FDA-approved bispecific antibodies using the platforms default-parameter Pareto-acceptance optimization loop, run to convergence or to the internal iteration ceiling, with no human curation between the platform run and the external profiler. Both wild-type and platform-optimized sequences were profiled independently with three publicly available developability tools: Aggrescan, CamSol, and the Therapeutic Antibody Profiler (TAP). Paired-sample tests (Wilcoxon signed-rank, exact binomial sign test, McNemar exact test) evaluated the direction and significance of changes. ResultsAcross the 17 evaluable paired arms profiled by TAP, PTIm-mAb cleared four wild-type CDR-vicinity Positive Charge Patch (PPC) flags Blinatumomab-Arm1 (1.9952 [->] 0.6885), Mosunetuzumab-Arm1 (1.3391 [->] 0.0568), Linvoseltamab-Arm2 (0.8060 [->] 0.0), and the headline Elranatamab-Arm1 case (1.7981 [->] 0.5799) achieved without trading off any other in-range metric and corroborated by Aggrescan and CamSol on the same arm. Total CDR length was significantly shortened across the cohort (Wilcoxon two-sided p = 0.0075, one-sided p = 0.0037, effect size r = 0.65): significant improvement on the metric most directly under the optimizers control. The directional shift on Aggrescan integrated aggregation propensity was also significant by sign test (24 of 36 chains improved, 2 unchanged, 10 worsened; p = 0.021). On the already-clean Zenocutuzumab profile the optimizer identified residual headroom (PPC 0.1191 [->] 0.0; SFvCSP 12.5 [->] 6.0), demonstrating that the platforms value extends to candidates that pass all flags. Three results: Teclistamab Arm-1, Emicizumab, and Talquetamab Arm-2 did not clear all flags and are presented as candidates for iterative re-invocation of the platform pipeline on the optimized output (planned follow-up; Section 5). The remaining TAP metrics (PSH, PPC magnitude, PNC, |SFvCSP|) trended in the improvement direction without reaching significance in this cohort, a pattern consistent with the expected statistical signature of a multi-objective optimizer applied to molecules already within the clinical-stage envelope. The platform reported a mean of 12.8 months and USD 723,889 of computational front-loading per project across the nine-project cohort (range 9.0-16.0 months; USD 510,000-960,000); the underlying cost assumptions are tabulated in Supplementary Table S3. ConclusionPTIm-mAb produces externally verifiable, literature-aligned improvements on the metrics most directly under its control, clears CDR-vicinity charge-patch flags on a meaningful fraction of flagged candidates, and front-loads substantial design-iteration work. The cohort-level pattern is consistent with a calibrated multi-objective optimizer operating at the edge of detectable headroom on a deliberately hard benchmark. We position the platform as an early-stage triage and lead-optimization layer in bispecific antibody discovery. For molecules whose first-pass result does not clear all flags, iterative re-invocation of the pipeline on the optimized output is a natural follow-up direction.

3

Structure-guided computational design and mechanistic understanding of the p95HER2-targeting NAZ-mAb antibody and its variants

Rawat, P.; Kyte, J. A.; Greiff, V.; Dorraji, E.

2026-07-11 bioinformatics 10.64898/2026.07.07.736817 medRxiv

Top 0.1%

39.2%

Show abstract

Human epidermal growth factor receptor 2 (HER2) is an oncogenic receptor tyrosine kinase in breast cancer and other malignancies. A subset of HER2-positive tumours expresses 611-CTF-p95HER2, a tumour-specific, hyperactive truncated isoform associated with metastasis and treatment resistance that lacks most of the extracellular domain targeted by conventional HER2-directed antibodies. We previously developed NAZ-mAb (formerly known as Oslo-2), a monoclonal antibody against 611-CTF-p95HER2. Here, we describe a computational antibody-engineering workflow for designing variants of NAZ-mAb. Starting from the sequence alone, we modeled the NAZ-mAb-611-CTF-p95HER2 complex, generated a combinatorial mutational landscape using FoldX 5.0, and prioritized candidate variants using predicted interaction energy and developability criteria. Two variants representing distinct design strategies were selected for validation: an aromatic double mutant, NAZ-mAb v1 (L:S31W/L:H107W), and a conservative single mutant, NAZ-mAb v2 (L:S31M). Both variants were successfully expressed as recombinant IgGs; NAZ-mAb v2 achieved a five-fold higher recombinant expression yield than parental NAZ-mAb, while both variants retained antigen binding with a higher apparent signal than the parental antibody in indirect ELISA. However, Biacore two-state kinetic analysis revealed weaker affinities than the parental antibody (KD NAZ-mAb v1: 32.6 nM, NAZ-mAb v2: 9.45 nM vs. parental NAZ-mAb: 5.33 nM). These findings show that the computational workflow can generate experimentally tractable, antigen-engaging NAZ-mAb variants, while also highlighting the limitations of fixed-backbone interaction-energy ranking as a predictor of binding affinity and yield. This study provides a practical framework for computationally driven, developability-aware antibody optimization in the absence of experimental structural data.

4

Multi-Scale Machine Learning for Antibody-Antigen Binding Affinity Prediction Using Deep Mutational Scanning and Structural Features

Sivasubramani, S.

2026-06-23 bioinformatics 10.64898/2026.06.09.730151 medRxiv

Top 0.1%

38.4%

Show abstract

Predicting how mutations alter antibody-antigen binding affinity is essential for antibody engineering and vaccine design, yet current methods generalize poorly to unseen complexes. We present a multi-scale machine learning framework integrating 93 descriptors across four modalities: physicochemical, structural, ESM-2 protein language model, and solvent-accessible surface area (SASA)/{Delta}{Delta}Gfold features. Under leave-one-complex-out deep mutational scanning (LOCO-DMS) cross-validation on AbAgym (36,541 mutations, 68 experiments, 13 pathogens), gradient boosting achieved MCC = 0.206; a confidence-stratified ensemble reached MCC = 0.374 (83.5% accuracy, 25.5% coverage). No single modality exceeds the majority baseline alone; only multi-scale fusion succeeds. Boltzmann ceiling analysis shows 45.9% of mutations are near-neutral (|{Delta}{Delta}G| < kBT), bounding theoretical maximum MCC at 0.473; our method achieves 79.1% of this limit. Five deep learning architectures benchmarked under LOCO-DMS showed self-attention matching gradient boosting (MCC = 0.200). Cross-pathogen transfer failed systematically (mean 46.7%), confirming universal binding predictors remain an open challenge.

5

Hybrid quantum-classical de novo design of MHC-binding peptides

Engdal, E. S.; Funk, J.; Bacarreza, O.; Machado, L.; Johansen, K. H.; Kemming, J.; Farnsworth, T.; Brasas, V.; Lefevre-Morand, R. Y. L.; Slysz, M.; Noerregaard, O. L.; Sandberg, O. A. D. A.; Makarovskiy, A.; Lodahl, P.; Acevedo-Rocha, C. G.; Kurowski, K.; Hadrup, S. R.; Clements, W. R.; Jenkins, T.

2026-07-10 biochemistry 10.64898/2026.07.09.736951 medRxiv

Top 0.1%

18.4%

Show abstract

Deep generative models have become a leading approach for designing therapeutic molecules, yet efficiently exploring vast biomolecular sequence spaces remains difficult, particularly for targets with limited training data. The prior distribution that seeds a generative model shapes which regions of sequence space it explores, and recent work suggests that non-classical distributions sampled from quantum processors can serve as a structured alternative to the factorised Gaussian priors used by default. Whether such priors help on complex biological design tasks has been largely untested. Here we present the first end-to-end hybrid quantum-classical pipeline for de novo design of MHC class I-binding peptides, coupling a generative adversarial network (GAN) to latent vectors sampled from a real photonic quantum processor. Tested in silico across 131 HLA alleles, quantum-derived priors increased the yield of predicted strong binders, with the largest relative gains for understudied alleles where classical baselines perform worst. We selected three understudied alleles for further evaluation, finding that large gains coincided with broader sequence exploration at non-anchor positions while anchor specificity was preserved. On these three alleles, we validated the designs in vitro using peptide-MHC stability ELISAs, confirming that quantum-designed peptides are potent stabilisers of peptide-MHC class I complexes. These results establish structured, hardware-realisable non-classical priors as a useful inductive bias for generative peptide design, with direct relevance to personalised immunotherapies and vaccines.

6

Frozen Protein Foundation-Model Embeddings Improve Antibody-Antigen ΔΔG Ranking

Wang, R.; Jin, K.; Pan, L.

2026-07-14 bioinformatics 10.64898/2026.07.13.738250 medRxiv

Top 0.1%

9.7%

Show abstract

We investigate whether representations from AINN-P1--a protein foundation model trained autoregressively on tens of millions of natural protein sequences--transfer to the task of ranking antibody-antigen pairs by binding affinity. Casting affinity maturation as a learning-to-rank problem over the change in binding free energy ({Delta}{Delta}G), we compare a task-specific sequence model trained end-to-end from scratch against lightweight downstream heads built on top of frozen AINN-P1 embeddings, all evaluated under an identical five-fold cross-validation protocol. A regularized linear probe on the frozen embeddings already surpasses the from-scratch baseline, and an optimized lightweight head raises the mean Spearman rank correlation from 0.42 to 0.53--a relative improvement of approximately 28%-- while training in seconds and without any fine-tuning of the foundation model. Because a linear probe alone exceeds the fully trained end-to-end baseline, the gain is attributable to representation quality rather than to added downstream-model capacity. These results position frozen foundation-model embeddings as a strong, data-efficient default for affinity ranking in antibody engineering and establish a conservative lower bound that task-adaptive fine-tuning is expected to exceed.

7

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Yin, R.; Saravanakumar, S.; Shi, S. Y.; Park, M.; Lin, V.; Lee, J.; Cheung, M.; Felbinger, N.; Kaufman, S.; Eisenberg, M.; Pierce, B.

2026-07-06 bioinformatics 10.64898/2026.07.04.736425 medRxiv

Top 0.1%

9.5%

Show abstract

Determining the structural basis of antigen recognition by antibodies and T cell receptors (TCRs) provides critical insights into effective immune targeting and can inform design of biotherapeutics and vaccines. Accurate computational modeling of antibodies and TCRs in complex with their targets poses a major challenge for predictive methods, including AlphaFold, which is generally accurate for modeling protein complexes but has shown limited success for immune recognition. In this study we assessed the performance of AlphaFold2, AlphaFold3, increased sampling protocols, and related deep learning methods for modeling antibody-protein, antibody-peptide, and TCR-peptide-major histocompatibility complex (pMHC) recognition. We show that increased sampling and AlphaFold3 generally improve performance relative to default sampling and AlphaFold2, however predictive accuracy and improvement levels varied considerably among interface classes, with antibody-peptide complexes representing a challenge despite their small antigen size. Comparing per-case success across methods showed some complementarity, indicating opportunities for increased success through model pooling approaches, for instance increasing antibody-peptide near-native success from 41% to 59%. Analysis of AlphaFold confidence scores and modeling of a noncanonical complex provided further insights into predictive performance. These results highlight considerations for predictive antibody and TCR complex modeling efforts, while revealing key distinctions among protocols, scoring, and immune complex classes.

8

Peptide:MHC Binding Stability Prediction Using Protein Language Models

Karthikeyan, D.; Vincent, B.; Rubinsteyn, A.

2026-06-29 bioinformatics 10.64898/2026.06.28.735023 medRxiv

Top 0.1%

9.4%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWPeptide:MHC class I (pMHC-I) binding stability governs the persistence of antigenic complexes at the cell surface and plays a key role in facilitating downstream immunological signals such as antigen presentation, T-cell activation, and immunodominance. However, methods for in silico stability prediction remain underexplored relative to binding affinity prediction, in part because available half-life datasets are sparse and expensive to collect. Here, we perform a systematic reassessment of pMHC-I stability prediction using controlled, similarity-aware data splits and apply a recently introduced supervised transfer-learning strategy to MINT, an interaction-aware protein language model, pre-trained on binding affinity and fine-tuned for quantitative half-life prediction. We show that MINT improves stability prediction over standard ESM-2 representations and existing predictors, and that assay-conditioned recalibration corrects systematic shifts across experimental measurement modalities. Across eluted ligand, immunogenicity, and personalized neoantigen prioritization benchmarks, predicted stability provides signal beyond binding affinity, enriching for naturally presented and immunogenic peptides within affinity-filtered candidate sets. These results establish pMHC-I half-life as an orthogonal and transferable biophysical signal connecting peptide binding, surface presentation, and T-cell recognition, and provide a leakage-aware, assay-aware framework for future antigen-presentation modeling.

9

RNA-Encoded PGT121-LS Anti-HIV Antibody: Comprehensive Preclinical Characterization and Translational Pharmacokinetics

Tolksdorf, F.; Nelke, J.; Johannson, R.; Caesar, J.; Chaturvedi, A.; Kopp, A.; Fischer, L.; Malz, A.; Kratochvil, S.; Gerhard, I.; Bogen, J. P.; Morin, C.; Kullmann, M.; Seaman, M. S.; Tomaras, G. D.; Yates, N. L.; Ackerman, M. E.; Weiner, J. A.; Ellinghaus, U.; Stadler, C. R.; Sahin, U.; Le Douce, V.

2026-06-29 immunology 10.64898/2026.06.24.734219 medRxiv

Top 0.1%

9.3%

Show abstract

Human Immunodeficiency Virus (HIV)-1 broadly neutralizing antibodies (bNAbs) have demonstrated clinical efficacy, but face manufacturing challenges associated with recombinant protein production and purification. Here, we present a ribonucleic acid (RNA)-encoded bNAb (RibobNAb) platform that enables in vivo antibody production of the clinically validated bNAb PGT121 via lipid nanoparticle (LNP) delivery, supporting rapid evaluation of Fc variants (LS, del294, LS-del294) in vitro and in vivo. We confirmed expression, sub-nanomolar HIV-1 Env binding, and potent neutralization across all RibobNAb variants in vitro. In mice, single RNA-LNP administrations yielded in vivo expression of all RibobNAb variants, with PGT121-LS exhibiting a prolonged half-life compared with PGT121. In non-human primates (NHPs), a single intravenous administration of PGT121-LS RNA-LNP was well tolerated without anti-drug antibody (ADA) formation over 180 days and resulted in PGT121-LS half-lives comparable to the reference protein. Single intramuscular administration showed RibobNAb expression but resulted in ADA development from Day 14 onwards and lower bioavailability. In vivo-expressed PGT121-LS RibobNAb retained identical antiviral functionality to PGT121-LS reference protein. An NHP pharmacokinetics model integrating RNA transfection and translation dynamics enabled allometric scaling and first-in-human dose prediction. We highlight RibobNAbs as an alternative to conventional purified protein antibodies for rapid development of bNAb-based therapeutic strategies.

10

HALPred-B: Host-Aware Linear B-Cell Epitope Prediction: Challenges, Limitations, and Variability Across Species

Gautam, P.; Mitra, P.; Sinha, I.

2026-06-26 bioinformatics 10.64898/2026.06.22.733770 medRxiv

Top 0.1%

8.9%

Show abstract

Predicting linear B-cell epitopes is a basic immunoinformatics task that has a direct impact on vaccine design and antibody engineering. Recent advances in machine learning have improved predictive performance, but most existing approaches are trained on aggregated datasets and assume that antigenic patterns are conserved across host organisms. This assumption ignores the immunological variability depending on the host and prevents generalizing the model across species. This is the first systematic host-wise evaluation where we present a systematic machine learning-based analysis of host-aware linear B-cell epitope prediction using curated datasets from the Immune Epitope Database (IEDB). We build separate datasets for human, mouse, and non-human primate hosts and assess several classification models, including Random Forest, Support Vector Machine (SVM), Gradient Boosting, XGBoost, and K-Nearest Neighbors (KNN). The models exploit feature representations derived from sequences, such as AAIndex descriptors, biochemical properties from ExPASy, and dipeptide composition. Our results show that predictive performance differs substantially across hosts. Models achieve up to 86.07% accuracy and 0.93 ROC-AUC on human datasets but lower performance on mouse and non-human primate datasets. This gap underlies dataset bias and sequence distribution differences, as well as the inability of existing features to capture host-specific immunological context. These results indicate that the prediction of linear B-cell epitopes is intrinsically host-specific, and a single global model does not generalize well across species. We propose to incorporate host-aware modeling strategies and organism-specific features for enhanced predictive reliability and biological relevance.

11

Strict OOD Antigen-to-Antibody Retrieval with CDR-Aware Slot Late Interaction

Liu, P.; Pan, M.; Yan, C.; Li, F.; Zhang, J.

2026-07-03 bioinformatics 10.64898/2026.06.30.735486 medRxiv

Top 0.1%

8.8%

Show abstract

Antigen-specific antibody retrieval aims to rank candidate antibodies for a target antigen, providing an early virtual-screening step before structural modeling or experimental validation. Existing sequence-based antibody-antigen interaction studies often formulate the problem as pairwise binding prediction, and random or non-clustered evaluations can overestimate generalization when related antigens appear across training and test data. We study a strict antigen-cluster out-of-distribution (OOD) retrieval setting in which test antigens come from sequence clusters unseen during training. This setting is difficult because binding is driven by local epitope-CDR complementarity, while available databases mainly contain observed positive complexes and lack reliable negative labels for unlabeled candidates. We propose Ab-CASLR, an antibody CDR-aware slot late-interaction retriever that encodes antigens with ESM-2, encodes antibodies with IgBert, constrains antibody-side latent slots to complementarity-determining regions (CDRs), and scores local slot compatibility instead of single-vector global similarity. On a strict OOD benchmark with 849 antigen queries and 869 candidate antibodies, the model achieves 7.42\% Hits@10, outperforming k-mer homology transfer at 5.53\% Hits@10 and yielding 6.28-fold enrichment over exact random screening at $K=10$. Ablations and diagnostics show that CDR-constrained antibody slots remain diverse, whereas antigen-side latent slots collapse into similar summaries. These results support CDR-aware local antibody representation as a useful inductive bias for early binder recovery under strict OOD evaluation, while antigen-side epitope grounding remains unresolved.

12

Scalable Production of a De Novo SARS-CoV-2 Antiviral miniprotein in Escherichia coli

Shin, J.; KIm, E.-m.; Jang, J.-h.; Jee, S.-w.; Kim, S.-h.; Yu, S.; Yoon, M.; Craig, D.; Swoyer, R.; Alamuri, P.; Price, A.; Patel, S.; Ravichandran, R.; Carter, L.; Pallerla, S.

2026-06-24 bioengineering 10.64898/2026.06.23.734092 medRxiv

Top 0.1%

7.8%

Show abstract

The rapid emergence of SARS-CoV-2 variants that evade neutralizing antibodies underscores the need for next-generation antiviral biologics that combine molecular precision with scalable, cost-effective manufacturing. Computationally designed miniproteins targeting the receptor-binding domain (RBD) of the spike protein offer a compelling alternative to monoclonal antibodies due to their small size, high thermal stability, and compatibility with microbial expression systems. Here we report the end-to-end development and cGMP production of IPD-52520, a de novo antiviral miniprotein, using an optimized E. coli platform. Two miniprotein candidates, a homotrimeric construct (Trimer is referred to as IPD-52520, 17 kDa) and a tandem fusion (Daisy is referred to as IPD-52521, 25 kDa), were evaluated in parallel through systematic optimization of strain selection, media composition, fed-batch fermentation, inclusion-body solubilization, refolding, and chromatographic purification. The Trimer was downselected as the lead molecule based on superior preclinical efficacy, favorable pharmacokinetic properties, and higher volumetric manufacturing yields. The optimized process delivers approximately 2 g/L of purified protein at greater than 90% purity. Scale-up from 5 L to 50 L under cGMP conditions demonstrated excellent batch-to-batch reproducibility across six independent batches, supporting nonclinical and Phase 1 clinical supply. Comprehensive biophysical characterization confirmed a well-folded, predominantly alpha-helical trimer (Tm = 73.4 {degrees}C; polydispersity = 1.005) with an intact primary structure and strong target-binding affinity (KD < 1 pM). Real-time stability studies indicate that the drug substance is stable at 2-8 {degrees}C for at least 12 months, with ongoing stability studies. These results demonstrate the feasibility of translating computationally designed antiviral miniproteins into manufacturable biologics and provide a platform applicable to rapid-response therapeutics against current and future pandemic threats.

13

Development and Characterisation of a Versatile Single-Domain Antibody Specific for M1-linked Ubiquitin Chains

Koch, J.; Bhark, S.-J.; Bader, V.; Fiil, B. K.; Lopez-Mendez, B.; Rasthoej, J. B.; Priesmann, D.; Mejias-Gomez, O.; Braghetto, M.; Montoya, G.; Gyrd-Hansen, M.; Winklhofer, K. F.; Goletz, S.; Damgaard, R. B.

2026-07-06 biochemistry 10.64898/2026.07.05.736589 medRxiv

Top 0.1%

7.8%

Show abstract

Ubiquitin signalling is mediated by structurally distinct polyubiquitin chains that encode discrete cellular functions. Progress in deciphering this ubiquitin code, particularly for the less abundant atypical chain types, has been hindered by limited availability of versatile chain type-specific affinity reagents. Here, we demonstrate that human single-domain antibodies (sdAbs) provide a versatile scaffold for the generation of ubiquitin linkage-specific binders. Using phage display and synthetic human sdAb libraries, we identified 2A6, an sdAb that specifically recognises methionine-1 (M1)-linked ubiquitin chains. To our knowledge, 2A6 represents the first reported sdAb with specificity for a defined homotypic ubiquitin chain linkage. 2A6 bound M1-linked ubiquitin chains with nanomolar affinity and was specific for M1-linked chains at the level of both diubiquitin and long polyubiquitin chains. AlphaFold3 modelling, supported by saturation mutagenesis, predicted that 2A6 recognises the proximal and distal ubiquitin moieties together with the region near the M1 linkage. Functionally, 2A6 enabled specific detection and enrichment of M1-linked ubiquitin across multiple applications, including ELISA, immunoblotting, immunoprecipitation under semi-denaturing conditions, substrate ubiquitination analysis, and immunofluorescence microscopy. The sdAb can be readily produced in E. coli from a single expression plasmid, providing a tractable, cost-effective and versatile reagent for investigating M1-linked ubiquitin signalling. Our work establishes sdAbs as a versatile scaffold for ubiquitin linkage-specific affinity reagents, providing a framework for the development of analogous binders specifically targeting additional ubiquitin linkages or architectures.

14

Folding scFv--Antigen Complexes at Scale

Shah, R. N.; Ouyang-Zhang, J.; Cohen, Z.; Briglia, M. R.; Zhang, C.; Klivans, A.; Diaz, D. J.

2026-07-03 bioinformatics 10.64898/2026.07.01.730981 medRxiv

Top 0.1%

7.7%

Show abstract

Accurate modeling of antibody-antigen (Ab-Ag) complexes is central to biologic development, yet the reliability and failures of modern Ab-Ag folding pipelines remain poorly characterized. Single-chain variable fragments (scFvs) are therapeutically important antibodies, but large-scale evaluations of structure prediction models on scFv-Ag complexes are largely lacking. We introduce a scalable benchmarking pipeline that generates large ensembles of scFv-Ag structure predictions by cofolding a curated subset of 3,800 Ab-Ag complexes from SAbDab using multiple state-of-the-art models under diverse inference-time settings. The resulting dataset, SCALE (scFv-Ag CompLex Ensembles) includes standardized scFv-Ag sequences and around 200,000 predicted complexes spanning different models, sampling strategies, and auxiliary inputs. Using SCALE, we evaluate model performance in recovering correct scFv-Ag interfaces and assess the ability of existing confidence metrics to select the best structure from prediction ensembles. We find that while confidence scores effectively distinguish easy from hard scFv-Ag complexes, they often fail to identify the highest-quality interface for a given target. Further analysis shows that near-correct interfaces typically appear in ensembles but at low frequency, and inference-time choices like sampling, recycling, and using evolutionary or structural information are crucial for accurate scFv-Ag complex predictions. Dataset and analysis code are available at https://huggingface.co/datasets/ravishah1/SCALE

15

Prediction-Guided Design of a More Developable FGF21 Construct

Bozkurt, C.; Nathanail, E.; Goteti, A.

2026-07-14 bioengineering 10.64898/2026.07.13.738140 medRxiv

Top 0.1%

6.8%

Show abstract

For structural-biology and protein-production pipelines, the hardest part of a difficult protein is not the biology -- it is obtaining a well-behaved sample for functional studies. Programs routinely stall at construct design, expression, and purification: deciding where to truncate, which tags to use, how to express, and how to purify so the protein survives concentration and handling. These decisions are still made largely by literature precedent and experimental experience, and they require trial-and-error before arriving at a functional construct for hard targets. We present a prospective, single-pair wet-lab case study testing whether an integrated computational platform can improve these decisions. For human fibroblast growth factor 21 (FGF21) -- a clinically important and stability-challenged metabolic hormone -- we compared two expression constructs produced side by side under the same experimental workflow, using two different design strategies: one designed by a scientist from the literature (reproducing the published core-domain construct, PDB 6M6E), and one designed by the Orbion platform -- an AI, prediction-guided protein-design system (orbion.life) -- which additionally generated the expression and purification protocols (executed scientist-in-the-loop). The platforms construct used an unconventional, longer C-terminal boundary not found in public sequence databases. Since the two constructs differ in more than one feature, we treat them as workflow-level designs throughout. The scientist construct gave a higher initial yield ([~]2.4 xmore protein recovered at affinity capture). The platform-designed construct, however, showed a more favourable downstream developability profile: it concentrated higher (1.4 vs 0.7 mg/mL) while remaining more monodisperse by dynamic light scattering (DLS). The scientist construct, in contrast, aggregated on concentration, so its initial-yield advantage did not survive: in the final concentrated sample the Orbion construct provided the more usable material for downstream studies. Computed for the mammalian host used, the platform had prospectively scored its own design higher (composite 68.7 vs 59.0 for the scientist-designed construct), and its predictions of yield, solubility, and disorder matched the wet-lab outcome. This is a single, deliberately scoped case study, not a population-level benchmark; the two constructs differ in more than one feature, and biological activity was not assayed. Alongside the bottlenecks of this approach discussed here, used as a decision aid, prediction-guided construct and protocol design has the potential to remove costly iteration cycles of protein production campaigns.

16

FCRL5 is a fucose-sensitive IgG-Fc receptor with binding properties distinct from classical Fcγ receptors

van der Hoeven, N.; Holborough-Kerkvliet, M. D.; Bao, Y.; Bentlage, A. E.; de Heer-Ooijevaar, P.; Derksen, N. I.; Damelang, T.; de Kreuk, B.-J.; Labrijn, A. F.; Vidarsson, G.; Rispens, T.

2026-07-07 immunology 10.64898/2026.07.01.735886 medRxiv

Top 0.1%

6.6%

Show abstract

Fc receptor-like protein 5 (FCRL5) is a low-affinity IgG receptor expressed on B cells, with emerging therapeutic relevance due to its expression on multiple myeloma cells, and a potential role in regulating B cell responses. Previous reports on the FCRL5-IgG interaction vary widely in reported affinities, binding differences across IgG subclasses, and molecular requirements for maximal binding. Furthermore, the impact of Fc-engineering strategies, as used in (therapeutic) monoclonal antibodies, remains poorly understood. Here, we provide a comprehensive biochemical analysis of the FCRL5-IgG interaction. We demonstrate that FCRL5 is a true IgG Fc-receptor, binding with very low affinity (60-80 M). FCRL5 binds IgG in a manner involving primarily the two N-terminal domains of FCRL5, and the third domain for maximal binding, but with distinct essential residues in the IgG Fc-tail. Surface plasmon resonance analysis of the binding of FCRL5 to the various IgG subclasses revealed a preference for IgG1 and IgG4. Interestingly, various Fc-engineered IgG variants commonly used for silencing or enhancing of Fc receptor binding do not impact FCRL5 binding. Screening the binding of a set of IgG antibodies carrying defined sets of Fc-mutations to FCRL5 revealed E293 as a key binding determinant and led to the discovery of E293R as a mutation that selectively abrogates FCRL5 binding while preserving binding to other classical Fc{gamma}Rs. Lastly, we show that FCRL5 has considerable preference for binding afucosylated IgG. Together, our results define the essential characteristics of the IgG-FCRL5 interaction and demonstrate the potential of both naturally occurring IgG variants as well as therapeutically explored bioengineered IgG formats to differentially engage FCRL5.

17

IgGM2: An All-Atom Foundation Model for Adaptive Immune Receptor Design

Ma, J.; Wu, F.; Yao, L.; Gao, J.; Wang, R.; Li, Q.; Yang, N.; Jiang, S.; Huang, D.; Pan, X.; Zhu, Y.; Hou, T.; Yao, J.; Yan, J.

2026-07-09 bioinformatics 10.64898/2026.07.09.737510 medRxiv

Top 0.1%

6.1%

Show abstract

Accurate immune receptor design requires modeling the coupled variation of amino-acid sequence, full-atom conformation, and target-binding geometry across antibodies, nanobodies, and T-cell receptors (TCRs). Existing methods often address only part of this problem, either by separating structure generation from sequence design, relying on fixed-backbone inverse folding, or focusing on a single receptor class. We introduce IgGM2, a unified all-atom generative framework for immune receptor structure prediction and CDR sequence-structure co-design. IgGM2 follows a structure-to-design strategy: it first learns how immune receptors are positioned around fixed target structures, and then transfers this target-conditioned structural prior to CDR design. Unlike modular design pipelines, IgGM2 jointly generates CDR residue identities and full-atom receptor structures, allowing framework geometry to adapt to designed CDRs without separate inverse folding or external sidechain packing. Unlike continuous residue encodings based on virtual-atom geometry, IgGM2 keeps sequence prediction explicit while using atom14 placeholders only for full-atom representation. On structure prediction benchmarks, IgGM2 better captures receptor-target spatial relationships than AlphaFold3 on FoldBench and achieves strong performance on TCR-pMHC modeling. On sequence design benchmarks, IgGM2 improves amino-acid recovery and Rosetta-based interface preference metrics, suggesting more favorable generated binding interfaces. These results support IgGM2 as a unified all-atom framework for adaptive immune receptor structure prediction and design.

18

AI-guided discovery for low-resource peptide engineering using evolutionary scale modeling

Andrekson, L.; Rydbergh, R.; Mercado, R.; Wenzel, M.

2026-07-01 bioinformatics 10.64898/2026.06.25.734678 medRxiv

Top 0.1%

4.9%

Show abstract

Reliable estimation of downstream performance in low-data peptide machine learning is critical for guiding early-stage AI-driven peptide engineering. Yet, it is often unclear how to assess whether a model will be effective in iterative discovery settings. Here, we show that the cross validation R2 score can serve as a simple and robust proxy for predicting active learning workflow performance, enabling early-stage evaluation of model suitability for sequential peptide optimization. To support this, we introduce SCARSE, a machine learning framework combining ESM-2 protein language model embeddings with Gaussian process regression and extremely randomized trees classification, designed for low-resource peptide property prediction (20-500 training samples). We benchmark SCARSE across 23 peptide and small-protein datasets covering substitution and indel variants, antimicrobial peptides, cell-penetrating peptides, and toxic/non-toxic peptides. SCARSE significantly outperforms a hand-engineered descriptor baseline on substitution and indel tasks, while comparable performance was achieved on shorter peptide non-mutant datasets where simpler descriptors capture enough of the signal. In simulated active learning workflows, SCARSE consistently outperforms baseline and random sampling strategies. Notably, we demonstrate that CV R2 computed from as few as 50 labeled peptides can be sufficient to estimate final active learning end-point performance, providing a practical, data-efficient criterion for deciding whether a given dataset combined with SCARSE is suitable for iterative peptide discovery. SCARSE is released as a pip package and is available via HuggingFace Spaces to facilitate integration into peptide engineering workflows.

19

BoltzProt-1: Towards Efficient De Novo Binder Design with Good Developability

Ucar, T.; Bates, J.; Fu, Y.; Shi, W.; Stark, H.; Nava, D.; Cavalleri, L.; Wohlwend, J.; Corso, G.; Passaro, S.

2026-06-27 bioinformatics 10.64898/2026.06.23.733997 medRxiv

Top 0.1%

4.0%

Show abstract

Designing binders against novel protein targets remains a central challenge in computational drug discovery. Here we introduce BoltzProt-1, a pipeline for generating protein binders, including nanobodies, with improved hit rates and favorable developability properties. At its core lie a refined iteration of BoltzGens generative model and a novel protein-protein interaction prediction model, BoltzPPI. Employing BoltzPPI instead of BoltzGens standard structure-prediction confidence metrics to rank nanobody (VHH) designs increases the confirmed-binder hit rate from 3.3% to 8.0% across 10 novel targets. Assessed on 10 additional targets used in prior literature, the BoltzProt-1 pipeline obtains nanobody screening hits for 7 of 10 targets, surpassing the 6 of 10 previously reported by Chai-2. Finally, evaluating the developability of BoltzProt-1-designed nanobodies in terms of stability, aggregation, purity, polyspecificity and hydrophobicity reveals that 58% of its confirmed binders pass every criterion, exceeding both BoltzGen (40%) and clinical-stage VHH controls (21%). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/733997v1_ufig1.gif" ALT="Figure 1"> View larger version (39K): org.highwire.dtl.DTLVardef@125fb31org.highwire.dtl.DTLVardef@8e7482org.highwire.dtl.DTLVardef@8318a1org.highwire.dtl.DTLVardef@c62ab5_HPS_FORMAT_FIGEXP M_FIG C_FIG

20

AptCancerDB: A Curated Knowledgebase and Translational Discovery Platform for Anticancer Aptamers

Bajiya, N.; Singh, S.; Raghava, G. P. S.

2026-07-09 cancer biology 10.64898/2026.07.02.735999 medRxiv

Top 0.2%

3.5%

Show abstract

Aptamers are emerging as important molecular recognition ligands in oncology, playing significant roles in cancer diagnostics, targeted therapies, drug delivery systems, and molecular imaging. Numerous aptamers have advanced to clinical trials, indicating their potential for real-world applications; however, existing databases fail to capture that. To bridge this critical gap, we developed AptCancerDB (https://webs.iiitd.edu.in/raghava/aptcancerdb/), a comprehensive, manually curated database of experimentally verified anticancer aptamers. The current release contains 1,941 entries collected from studies published between 2000 and 2025, covering 29 cancer types, approximately 200 cancer cell lines, and direct links to 22 clinical trials. Each entry is annotated with sequence information, target details, cancer type, cell line, SELEX methodology, affinity determination data, chemical modifications, and biological activities. The dataset is dominated by 82.7% ssDNA, reflecting its superior stability and ease of synthesis, while only 16.6% is ssRNA and appears primarily in studies targeting complex intracellular or protein-protein interactions. To facilitate structural analysis, predicted secondary structures, dot-bracket notations, specific structural elements, and minimum free energy values were also included. AptCancerDB integrates a MySQL backend with an ArcadeDB/OpenCypher-based Knowledge Graph, enabling exploration of relationships among aptamers, targets, cancer types, cell lines, and functional applications. The platform provides advanced search and browsing facilities, BLASTn-based similarity searching, and GC Calculator. Built on a modern, responsive frontend (React/TypeScript/Tailwind CSS), the platform includes a REST API for data retrieval. By integrating fragmented experimental data into a unified cancer-focused resource, AptCancerDB serves as a valuable resource for comparative analysis, aptamer discovery, and the development of next-generation aptamer-based diagnostics and therapeutics. HighlightsO_LICurated knowledge base of experimentally validated anticancer aptamers. C_LIO_LIAptCancerDB contain therapeutic, tumor-homing and cell-penetrating aptamers. C_LIO_LISummarizes clinical progress and translational trends in anticancer aptamer research. C_LIO_LISupports rational aptamer design using molecular, functional, and clinical annotations C_LIO_LIDisease-focused resource for cancer diagnosis, therapy, and drug delivery C_LI TeaserAptCancerDB maintains experimentally validated anticancer aptamers relevant to diagnosis, drug delivery, and therapy.