Back

mAbs

Informa UK Limited

Preprints posted in the last 90 days, ranked by how well they match mAbs's content profile, based on 28 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
An AI/ML-Powered Workflow for End-to-End Cell Line Development

Raj Unnikandam Veettil, S.; Donatelli, J.; Kalra, G.; Veronica Ljubetic San Martin, C.; Ramakrishnan, S.; McGregor, C.; Wallace, M.; Ankala, R.; Rodrigues de Souza Pinto, L.; Dhama, A.; Regens, C.; Li, Y.; Smith, D.

2026-02-07 cell biology 10.64898/2026.02.04.703387 medRxiv
Top 0.1%
26.9%
Show abstract

The generation of clonal CHO cell lines is foundational to biologics manufacturing; however, labor-intensive cell culture workflows predominate in the field. We created the CLAIRE (Cell Line AI Recognition and Evaluation) tool to streamline end-to-end cell line development by integrating deep-learning image analysis with automated liquid handling. We benchmarked three object detection models for monoclonality verification and found DETR provides superior accuracy (>0.90 F1-score) in identifying single cells. To quantify the outgrowth of cell lines, we evaluated multiple zero-shot SAM2 segmentation models against a feature-based estimation method. Feature-based detection successfully identified diverse cell colony types while less robust performance was observed for SAM2 models, particularly for sparse density colonies. The pre-trained DETR and feature-based detection models were wrapped in a task-focused user interface that outputs cell line hitpick lists compatible with a Lynx LM1800 liquid handler in addition to custom scripts automating cell passaging and sampling. This approach yielded an end-to-end 36 day CLD workflow capable of generating high-titer cell lines for multiple complex antibody structures. Here, we open-access our trained models, user interface, and Lynx automation scripts to provide a modular toolkit useful for clonal cell line engineering projects. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=153 SRC="FIGDIR/small/703387v1_ufig1.gif" ALT="Figure 1"> View larger version (51K): org.highwire.dtl.DTLVardef@1f72e70org.highwire.dtl.DTLVardef@109c54dorg.highwire.dtl.DTLVardef@7867b1org.highwire.dtl.DTLVardef@dfa61e_HPS_FORMAT_FIGEXP M_FIG C_FIG

2
KyDab - a comprehensive database of antibody discovery selection campaigns.

Zhou, Q.; Chomicz, D.; Melvin, D.; Griffiths, M.; Yahiya, S.; Reece, S.; Le Pannerer, M.-M.; Krawczyk, K.

2026-03-27 bioinformatics 10.64898/2026.03.25.713450 medRxiv
Top 0.1%
26.3%
Show abstract

Preclinical antibody discovery relies on progressive screening and down-selection of candidate antibodies from large immune repertoires, yet this critical process is poorly represented in existing public databases. Here we introduce KyDab (Kymouse Antibody Database), a well-curated database of antibody discovery selection data generated using standardized workflows on the Kymouse humanized mouse platform. The current release includes 11 Kymouse platform mice immunisation studies covering 51 immunogens, more than 120,000 paired heavy-light chain sequences, and binding measurements for a selected subset of experimentally characterized clones. By capturing full-funnel selection data with consistent metadata and both positive and negative experimental outcomes, KyDab provides a valuable data resource for the development and evaluation of artificial intelligence models for antibody discovery. KyDab is accessible https://kydab.naturalantibody.com, and the database will be continuously updated as new datasets become available.

3
CARTiBASE: an interactive knowledge base for CAR sequence retrieval and similarity analysis

Le Compte, G.; Ceylan, H.; Meysman, P.; Laukens, K.

2026-02-26 immunology 10.64898/2026.02.25.707638 medRxiv
Top 0.1%
18.2%
Show abstract

SummaryChimeric Antigen Receptors (CARs) are modular synthetic constructs that have transformed cellular immunotherapy, enabling targeted recognition and killing of malignant cells. Their clinical success has driven an explosive growth in new receptor designs, but these sequences are dispersed across heterogeneous sources such as publications, patents and supplementary files. This fragmentation and inconsistency limits comparative analysis, reproducibility and the reuse of existing constructs. To address this, we curated and standardized more than 10,000 CAR sequences into a single, harmonized resource. CARTiBASE is a web-based platform that provides standardized annotation, interactive browsing and fast similarity search across this curated collection. This unique database was leveraged to analyse the diversity in current CAR constructs within the public domain, revealing common design trends and lineages, as well as highlighting potential avenues for future CAR development. Availability and ImplementationCARTiBASE is freely available for non-commercial use at https://www.cartibase.org, without mandatory registration. The web server is implemented with a Python/Flask API backend and a Vue-based frontend and supports all major browsers. Users can search and filter thousands of CARs, inspect domain boundaries across signal peptide, antigen-binding domain, hinge, transmembrane, co-stimulatory and intracellular signaling regions, compare constructs and download sequences as FASTA files for downstream use.

4
Comprehensive Mapping of Immune Nanobody Repertoires with NanoMAP

White, W. L.; Moseley, E.; Tremblay, J. M.; Reilly, J.; Da'Darah, A. A.; Skelly, P.; Cowen, L. J.; Shoemaker, C. B.

2026-03-07 immunology 10.64898/2026.03.05.709882 medRxiv
Top 0.1%
12.5%
Show abstract

Nanobodies have recently emerged as alternatives to classical antibodies in therapeutic and diagnostic contexts from parasites to bacteria to viruses, promising improved stability and simpler manufacturing. To improve nanobody discovery efficiency, we developed an integrated experimental and computational pipeline for detailed characterization of the target binding properties of complete alpaca immune repertoires using our custom Nanobody Meta-clustering Analysis Platform (NanoMAP). We tested our pipeline on three distinct pools of targets, immunizing two alpacas with each pool and generating cDNA and phage display libraries from their immune repertoires. We then panned the phage libraries on each target. To produce more detailed binding information, we performed panning variations using subunits, natural variants, intact pathogens, and binding site competitors. Deep sequencing reads from nanobody libraries before and after each panning were pooled and analyzed with NanoMAP to identify nanobody clonal families and assess their levels of enrichment from the library in each panning, reflecting their affinities. NanoMAP outperformed standard clustering methods, producing clonal families that are coherent in sequence and function and detecting rare but high affinity families. By aggregating sequencing data within clonal families, NanoMAP produced reliable and rich data on nanobody repertoire binding phenotypes for each antigen, enhancing nanobody discovery capabilities.

5
CaptureBody - an anti-CD45 x anti-IgG bispecific antibody enables accurate unmixing for spectral flow cytometry

Zambidis, A. E.; Kallur Siddaramaiah, L.; Konecny, A. J.; Gray, M.; Prlic, M.

2026-02-16 immunology 10.64898/2026.02.13.704926 medRxiv
Top 0.1%
10.7%
Show abstract

Accurate spectral unmixing is a critical step for flow cytometry data analysis and requires a single stain control for every fluorescent parameter used in an experiment. Currently, compensation particles are often used for making single stain controls when a target protein is of low abundance or a cell type is of low frequency. However, compensation particles introduce incongruencies in emission spectra compared to cells resulting in spectral unmixing or compensation errors. To enable the use of cells regardless of the abundance of target proteins or immune cell type, we generated a bispecific antibody that links a human anti-CD45 and mouse anti-IgG variable region. We refer to this new bispecific tool as CaptureBody (CB) and highlight the benefits of its final nanobody-based design. We provide all sequences and methods necessary for the in-house expression of a CaptureBody to disseminate their use for spectral flow cytometry experiments.

6
Surface Display For Phage Assisted Continuous Evolution: A Platform For Evolving / Screening Nanobodies In Prokaryote Systems

Flores-Mora, F. E.; Brodsky, J.; Cerna, G. M.; Tse, A.; Hoover, R. L.; Bartelle, B. B.

2026-04-04 synthetic biology 10.64898/2026.04.03.716437 medRxiv
Top 0.1%
9.9%
Show abstract

Despite >50 years of methods development, specific antibodies are still generated at low throughput and remain in high demand across biotechnology. Most biologics and immunoprobes are monoclonal antibodies, developed using a combination of inoculating animals with a target antigen, engineered candidate libraries, and multiple rounds of selection using phage or yeast display. Here we introduce a synthetic biology scheme to eliminate the need for nearly all of these steps, by combining Surface display on E. coli and Phage display with the microvirus {Phi}X174, Assisting Continuous Evolution (SurPhACE). Instead of building libraries for screening, SurPhACE runs a closed evolutionary program. A typical experiment can have 1011 mutant candidates under active selection, with complete turnover of the mutant population every 30min, or >5x1012 unique mutants per day, using less than 100mL of bacterial culture media. We demonstrate SurPhACE for optimizing a nanobody to a related epitope, and develop novel nanobodies for an arbitrary target using a minimal starting library to establish a proof of concept and identify best practices for this scalable method for generating protein binders.

7
Effects of protein interface mutations on protein quality and affinity

de Kanter, J. K.; Smorodina, E.; Minnegalieva, A.; Arts, M.; Blaabjerg, L. M.; Frolenkova, M.; Rawat, P.; Wolfram, L.; Britze, H.; Wilke, Y.; Weissenborn, L.; Lindenburg, L.; Engelhart, E.; McGowan, K. L.; Emerson, R.; Lopez, R.; van Bemmel, J. G.; Demharter, S.; Spreafico, R.; Greiff, V.

2026-03-26 molecular biology 10.64898/2026.03.24.713863 medRxiv
Top 0.1%
7.2%
Show abstract

Accurately modeling antibody-antigen interactions requires distinguishing intrinsic binding affinity ("protein-interaction") from protein biophysical properties ("protein-quality"), including folding, stability, and expression. However, high-throughput mutational measurements commonly used to train and benchmark computational models often conflate these effects, obscuring the true determinants of molecular recognition. Here, we present an experimental and analytical framework to disentangle protein-interaction effects from protein-quality effects in single-domain antibody (VHH)-antigen binding. Using a large-scale deep mutational scanning (DMS) dataset spanning four VHH-antigen complexes, with single and double mutations in both partners, we introduce control binders to quantify protein-quality changes independently of protein-interaction. This enables decomposition of experimentally measured affinity into protein-interaction and protein-quality components at scale. Leveraging the disentangled dataset, we evaluated state-of-the-art structure- and sequence-based models for protein-quality and protein-interaction prediction and show that their performance largely reflects protein-quality rather than protein-interaction effects. Our results highlight a major confounder in current datasets and suggest that accounting for protein-quality will be essential for training next-generation affinity-prediction models. Nomenclature Antibody related termsO_LIPrimary VHH: The VHH of a VHH-antigen complex for which the paratope and the epitope weremutated. C_LIO_LIControl VHH: A second VHH that binds to the same antigen as the primary VHH but has non-overlapping epitope positions and therefore does not bind to any of the mutated antigen positions. C_LI Affinity-related termsO_LIReal Affinity: "The strength of the interaction between two [...] molecules that bind reversibly (interact)" 1. In the context of antibody-antigen binding, it quantifies interactions between active proteins (which are expressed and correctly folded 2 and are therefore functionally and biologically active (see below). It is commonly quantified by the equilibrium dissociation constant, KD. C_LIO_LIObserved affinity ({degrees}KD): The interaction strength experimentally measured between two molecules. Unlike real affinity, this value is confounded by the biophysical properties of the individual binding partners, specifically their folding, stability, and expression levels. Consequently, the observed affinity often differs from the real/intrinsic affinity if a significant fraction of the protein population is inactive 3. NOTE: Unless otherwise specified, {degrees}KD is reported in - log10 space. For example, a {degrees}KD of -9 corresponds to 10-9M or 1nM. C_LIO_LIChange in observed affinity ({Delta}{degrees}KD): The shift in the observed affinity between two proteins upon mutation, reported as the log10-transformed fold change. A value of 1 reflects a 10-fold difference, a value of 2 a 100-fold difference, etc. This aggregate change resolves into two distinct biophysical components 2, 4: O_LIProtein-interaction change: The change in the intrinsic thermodynamic affinity between the two binding partners, each in its active state (i.e., the specific change in interface Gibbs free energy because both enthalpy and entropy are considered). C_LIO_LIProtein-quality change: The change in the fraction of the mutated protein population that is biologically active - meaning it is expressed, correctly folded, and stable 2, 5. O_LIFolding: The process that guides the polypeptide chain toward its native conformation, which is a prerequisite for forming a functional binding site. C_LIO_LIStability: The thermodynamic capacity to maintain the folded structure over time and under physiological conditions. Stability (decrease in Gibbs free energy from the unfolded to the folded state) ensures the binding interface remains intact and prevents competing processes such as aggregation 6. C_LIO_LIExpression: The steady-state abundance of the protein. This is largely dependent on proper folding and stability, as cellular quality control mechanisms degrade proteins that fail to fold or remain stable at functional concentrations. C_LI C_LI C_LIO_LIChange in relative affinity ({Delta}{Delta}{degrees}KD): the difference between the {Delta}{degrees}KD of the primary VHH compared to the control VHH for a given epitope mutation. C_LI Model-related termsO_LIESM-IF1 sc: Single-chain (sc) structure-conditioned inverse folding model (ESM-IF1), using the isolated monomer structure of the mutated protein: either the VHH or the antigen 7. C_LIO_LIESM-IF1 mc: Multi-chain (mc) structure-conditioned model (ESM-IF1), using the full complex structure (both antibody and antigen) 7. C_LIO_LIStability prediction score: Score that represents the predicted change in stability based on a single mutation, normally represented as {Delta}{Delta}G. C_LI

8
High-Throughput FRET Affinity Screening Technique (HTFAST) For Cell-Free Expressed Binding Protein Characterization

Hejazi, S. S.; Noroozi, K.; Jurasic, V.; Jarboe, L. R.; Reuel, N. F.

2026-02-13 bioengineering 10.64898/2026.02.12.697512 medRxiv
Top 0.1%
6.2%
Show abstract

The rapid engineering of high-affinity binding proteins, such as nanobodies and single-domain antibodies (sdAbs), is increasingly driven by cell-free, machine-learning-guided optimization. However, high-throughput, quantitative characterization of binding affinity remains a major bottleneck, particularly for proteins expressed in cell-free systems without purification. Here, we present High-Throughput FRET Affinity Screening Technique (HTFAST) for rapid affinity characterization of binders expressed directly in crude E. coli cell-free protein synthesis reactions. HTFAST leverages Forster resonance energy transfer (FRET) between fluorescent-protein-fused binders and dye-labeled antigens to enable real-time, quantitative measurement of equilibrium dissociation constants. We systematically optimized fluorophore pairs used and labeling parameters using the SpyTag003-SpyCatcher003 model system. Using donor-quenching and acceptor-emission FRET analyses, HTFAST reliably quantified nanomolar binding affinities in crude lysates for SpyTag003-SpyCatcher003 model system. We validated the platform for nanobodies by characterizing a CD4-binding nanobody, Nb457, and benchmarking multiple SARS-CoV-2 receptor-binding domain sdAbs, demonstrating HTFASTs ability to rank binding strengths across a range of affinities. Finally, we demonstrate that both binding partners can be expressed directly in CFPS, further streamlining screening workflows. Overall, HTFAST provides a scalable, quantitative, and cell-free-compatible approach for high-throughput affinity screening, well suited for DBTL campaigns aimed at accelerating the development of next-generation binding proteins.

9
Validation and analysis of 12,000 AI-driven CAR-T designs in the Bits to Binders competition

Kosonocky, C. W.; Abel, A. M.; Feller, A. L.; Cifuentes Rieffer, A. E.; Woolley, P. R.; Lala, J.; Barth, D. R.; Gardner, T.; Ekker, S. C.; Ellington, A. D.; Wierson, W. A.; Marcotte, E. M.

2026-03-03 bioinformatics 10.64898/2026.03.03.709355 medRxiv
Top 0.1%
6.2%
Show abstract

Artificial intelligence (AI) methods for proteins have advanced rapidly, improving structure prediction and design, particularly for de novo binders. However, most evaluations emphasize binding affinity rather than higher-order biological function. We present Bits to Binders, a global competition benchmarking de novo binder design in the context of chimeric antigen receptor (CAR) T cells. Teams from 42 countries submitted 12,000 designs of 80-amino acid binders targeting human CD20 as CAR binding domains. Designs were screened by pooled CAR-T proliferation, identifying 707 designs exhibiting significant CD20-specific enrichment, with team hit rates from 0.6% to 38.4%. Top-performing candidates were validated as individual constructs, measuring CD20-specific proliferation, expansion, cytokine production, and targeted cell lysis. We identified common design methodologies and factors correlated with DNA synthesis, expression, and target-specific T cell activation which nearly double the success rates when applied as a retrospective filter. We release this dataset as an open resource, with practical recommendations to support more effective AI-driven binder design.

10
cloneXplorer: A high-throughput clone discovery platform based on conical microwell arrays

Stadler, G. K.; Tkachenko, E.; Neri, O.; Zakharov, M.; Zohar, O.; Deng, D. X.; Paraiso, K. D.; Rajaei, H.; Steele, S.; Shen, X.; Chenchik, A.; Yellen, B. B.

2026-01-20 cell biology 10.64898/2026.01.16.699323 medRxiv
Top 0.1%
6.1%
Show abstract

Antigen-specific T cell populations are of great value for studying immune recognition but tedious to generate by limiting dilution or cloning. Here, we develop a streamlined approach to generate antigen-specific T cell clones directly from peripheral blood using the cloneXplorer, a live-cell analysis and clone isolation platform based on conical microwell arrays. This platform continuously monitors cell proliferation, cytokine secretion, and surface markers in up to 100,000 single cell co-cultures, enabling the identification of rare, functionally defined T cells, which can be recovered for clonal expansion or sequence analysis. We benchmark the platform by performing several key demonstrations. First, we show that this platform can efficiently generate monoclonal cell populations from cell lines and human T cells. Next, we demonstrate that antigen-specificity can be identified at single cell resolution using a co-culture of Jurkat cells expressing NFAT-GFP, CD8, and a T cell receptor and K562 antigen presenting cells (APC) expressing a peptide library. Thereafter, we show that immune activation in mouse and human primary samples can be monitored by time lapse analysis of Interferon gamma (IFN-{gamma}) secretion in individual microwell co-cultures using a fluorescent sandwich assay. Finally, we combine these capabilities in a proof-of-concept demonstration, which uses IFN-{gamma} secretion and the presence of CD8 surface markers as hierarchical gates to isolate and expand antigen-specific T cells from human peripheral blood, and we verify their specificity by tetramer staining. Together, these results showcase potential applications of the cloneXplorer platform in cell line development, and in screening and validating immune receptor interactions with specific antigens. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=156 SRC="FIGDIR/small/699323v1_ufig1.gif" ALT="Figure 1"> View larger version (48K): org.highwire.dtl.DTLVardef@1aafc29org.highwire.dtl.DTLVardef@91272dorg.highwire.dtl.DTLVardef@1a306eeorg.highwire.dtl.DTLVardef@1bfd54_HPS_FORMAT_FIGEXP M_FIG C_FIG

11
Structure-Guided Computational Analysis of Linker effects in an scFv Targeting Guanylyl Cyclase C

Melo, R.; Viegas, T.

2026-04-01 bioinformatics 10.64898/2026.03.30.714862 medRxiv
Top 0.1%
4.9%
Show abstract

Single-chain variable fragments (scFvs) are widely used in diagnostic and therapeutic applications. These antibody fragments comprise two antibody variable domains connected by a flexible peptide linker whose properties critically influence folding, stability, oligomeric state, and antigen-binding. Therefore, careful linker selection represents a key step in scFv design. Guanylyl Cyclase C (GUCY2C) is a tumor-associated cell surface receptor expressed in gastrointestinal malignancies, including more than 90% of colorectal cancer (CRC) cases across all disease stages. Its restricted physiological expression pattern makes GUCY2C an attractive target for immunotherapy and precision oncology therapies. Here, we investigated the structural and functional consequences of incorporating alternative linker designs into an anti-GUCY2C scFv. Using molecular modeling, protein-protein docking, and molecular dynamics (MD) simulations, we evaluated the conformational stability, interdomain organization, and antigen-binding interactions of each construct. Our results provide a dynamic, structure-based assessment of how linker composition influences GUCY2C recognition and scFv structural behavior. Furthermore, this work establishes a computational framework for the rational optimization of GUCY2C-targeted antibody fragments.

12
SpeciefAI: Multi-species mRNA-level Antibody Framework Generation using Transformers

Grabarczyk, D.; Kocikowski, M.; Parys, M.; Cohen, S. B.; Alfaro, J. A.

2026-03-18 bioinformatics 10.64898/2026.03.16.712018 medRxiv
Top 0.1%
4.9%
Show abstract

MotivationEncoding antibodies (Abs) and nanobodies (Nbs) as mRNA enables in vivo production of therapeutic proteins. However, this approach requires meeting two species-dependent requirements: the mRNA encoding must support efficient expression in the host species, and the encoded protein sequence must resemble the natural Ab repertoire of the recipient species to minimize immunogenicity. These requirements motivate species-conditioned generative models for joint mRNA and protein design. ResultsWe propose SpeciefAI a transformer-based model for multi-species Ab and Nb species sequence-harmonisation by generation of novel Framework Regions (FRs) tailored to input Complementarity-Determining Regions (CDRs). Our model works directly in the mRNA space and learns the correspondence between FRs and CDRs in six species. The model is capable of generating sequences with a highly similar distribution to natural sequences and a mean absolute difference in codon adaptation index (CAI) of 0.013 and 0.033 for humans and dogs respectively. We show that the generated human sequences are highly human (0.95 T20 score) and canine sequences highly canine (0.95 cT20 score). We furthermore demonstrate that we can generate diverse candidate sequences using our method. Availability and ImplementationSource code is available on https://github.com/Dominko/SpeciefAI. OAS and COGNANO data are publicly available on https://opig.stats.ox.ac.uk/webapps/oas/ and https://cognanous.com/datasets/vhh-corpus (preprocessed versions available upon request). Canine data is available on https://zenodo.org/records/18301526.

13
Translational Bayesian Pharmacokinetic Framework for Uncertainty-Aware First-in-Human Dose Selection of Therapeutic Monoclonal Antibodies

Rajbanshi, B.; Guruacharya, A.

2026-03-03 pharmacology and toxicology 10.64898/2026.02.28.708739 medRxiv
Top 0.1%
4.8%
Show abstract

First-in-human (FIH) dose selection for monoclonal antibodies (mAbs) typically relies on deterministic allometric scaling but lacks formal uncertainty quantification. While Bayesian methods have been widely applied in population PK modeling and dose individualization, their use for propagating uncertainty through allometric scaling in mAb FIH dose selection has not been systematically explored. This is a critical limitation for molecules with narrow therapeutic windows, such as CNS-targeted mAbs, where the blood-brain barrier restricts IgG penetration to [~]0.1-0.3% of plasma concentrations, requiring high systemic doses that must be balanced against dose-limiting toxicities. To provide uncertainty-aware FIH dose recommendations, we developed and systematically evaluated a Bayesian hierarchical PK framework tested on CNS mAbs. By simultaneously learning population-level PK distributions and allometric scaling relationships from 9 well-characterized mAbs, the model propagates inter-antibody variability and scaling imprecision into full posterior predictive distributions. For validation, the framework was applied to 3 Alzheimers disease mAbs, donanemab, lecanemab, and aducanumab, using only cynomolgus monkey PK data to predict human outcomes. Leave-one-out cross-validation yielded a mean absolute prediction error of 11.6% for human clearance. Predicted FIH doses were 10 mg/kg for donanemab and lecanemab, and 30 mg/kg for aducanumab. Retrospective comparison with clinical data showed prediction errors of -36.1%, -36.1%, and -15.7%, respectively, all within two-fold of observed values. The systematic under-prediction of clearance is attributable to target-mediated drug disposition not captured by the linear model. However, this bias is pharmacologically conservative, as it over-predicts systemic exposure to ensure wider safety margins. This framework enables risk-informed FIH dose selection by providing full probability distributions of predicted exposures rather than point estimates.

14
ABFormer: A Transformer-based Model to Enhance Antibody-Drug Conjugates Activity Prediction through Contextualized Antibody-Antigen Embedding

Katabathuni, R.; Loka, V.; Gogte, S.; Kondaparthi, V.

2026-02-05 bioinformatics 10.64898/2026.02.03.703522 medRxiv
Top 0.1%
4.4%
Show abstract

Computational screening is increasingly becoming a crucial aspect of Antibody-Drug Conjugate (ADC) research, allowing the elimination of dead ends at earlier stages and concentrating on potential candidates, which can significantly reduce the cost of development. The current state-of-the-art deep learning model, ADCNet, usually considers antibodies, antigens, linkers, and payloads as distinct features. However, this overlooks the complex context of antibody-antigen binding, which is primarily responsible for the targeting and uptake of ADCs. To address this limitation, we present ABFormer, a transformer-based framework tailored for ADC activity prediction and in-silico triage. ABFormer integrates high-resolution antibody-antigen interface information through a pretrained interaction encoder and combines it with chemically enriched linker and payload representations obtained from a fine-tuned molecular encoder. This multi-modal design replaces naive feature concatenation with biologically informed contextual embeddings that more accurately reflect molecular recognition. ABFormer outperforms in leave-pair-out evaluation and achieves 100% accuracy on a separate test set of 22 novel ADCs, while the baselines are severely mis-calibrated. Ablation study confirms that the predictive capability is predominantly driven by interaction-aware antibody-antigen representations, while small-molecule encoders enhance specificity by reducing false positives. In conclusion, ABFormer provides a reliable and efficient platform for early filtering of ADC activity and selection of candidates.

15
Universal Baseline for in vitro Selection of Genetically Encoded Libraries

Yan, K.; Lima, G. M.; Bahadur, T.; Albert, V.; O'Gara, Z.; Bao, G.; Kossmann, C.; Kirby, W.; Mejia, F. B.; Michnik, M. L.; Maiorana, K.; Derda, R.

2026-02-15 biochemistry 10.64898/2026.02.14.705946 medRxiv
Top 0.1%
4.1%
Show abstract

Genetically encoded (GE) libraries enable identification of high-affinity ligands for diverse molecular targets through iterative in vitro selection and DNA sequencing or next-generation sequencing (NGS). Despite their impact in therapeutic development, a systematic framework for evaluating reproducibility in GE-molecular discoveries remains limited. To aid such analysis, we introduce the concept of baseline response, which reproducibly partitions active and inactive members of in vitro selection. The baseline response is provided by spiking a random DNA-barcoded population. We calibrated the baseline concept using Bioconductor EdgeR differential enrichment (DE) analysis of NGS of phage-displayed selection on oligosaccharide chitin and hepatitis virus NS3a* protease as model targets. We further show that mixing discovery campaigns also offers an effective baseline: chitin-enriched peptides serve as a baseline for DE-analysis of NS3a* selection and NS3a*-enriched peptides serve as a baseline for chitin binders. We applied baseline-stratified DE-analysis to 66 parallel selections performed in 3-5 replicates across 22 extracellular targets, including HER1-3, EpCAM, CAIX, PD-L1, and eight integrin receptors. Automated DE-analysis across hundreds of NGS files produced hits validated in a secondary screen and yielded synthetic macrocyclic ligands with mid-nanomolar affinity confirmed in 2-3 biophysical assays. For PD-L1, we further demonstrated how baseline-calibrated NGS data provide decision-enabling information for optimization of peptide macrocycles to yield potent single-digit nanomolar ligands for the cell-surface receptor. We anticipate that baseline-based analyses of NGS data from in vitro selection procedures will offer a scalable framework for reproducible hit discovery and standardized analysis across diverse in vitro selection campaigns. Significance StatementGenetically encoded selection technologies such as phage, mRNA and ribosome display, have produced FDA-approved therapeutics and numerous clinical candidates. Yet reproducibility in such in vitro discovery systems is rarely evaluated against a defined experimental baseline. Here, we establish a universal baseline by spiking unrelated, DNA-barcoded peptide sequences into selection libraries and quantifying their binding alongside target-enriched populations. This composition-agnostic strategy enables rigorous normalization, confidence assessment, and cross-target comparison of molecular discovery outcomes. Our framework introduces practical standards for reproducibility and statistical benchmarking across genetically encoded display platforms.

16
Dynamic multimodal survival prediction in multiple myeloma integrating gene expression, longitudinal laboratories, and treatment history

JIA, S.; Lysenko, A.; Boroevich, K. A.; Sharma, A.; Tsunoda, T.

2026-04-01 bioinformatics 10.64898/2026.03.30.715136 medRxiv
Top 0.1%
3.7%
Show abstract

Prognostic stratification in multiple myeloma (MM) relies on staging systems that assign patients to fixed categories at diagnosis and discard the temporal information that accumulates during treatment. We developed a dynamic multimodal framework that predicts residual overall survival using observation windows ranging from 1 to 18 months post-diagnosis. The model integrates DeepInsight-transformed gene expression representation, longitudinal laboratory measurement trajectories across 10 analytes, and treatment history for three drug classes through an adaptive fusion mechanism that accounts for missing clinical observations. On the MMRF CoMMpass cohort (n = 752), five-fold cross-validation yielded a concordance index (C-index) of 0.773 {+/-} 0.024 and a time-dependent AUC at a 1-year prediction horizon (tdAUC1yr) of 0.789 {+/-} 0.021, outperforming all evaluated baseline methods including DeepSurv (0.633 {+/-} 0.095) and random survival forests (0.636 {+/-} 0.024) on matched cross-validation splits. Modality ablation identified longitudinal laboratory measurements as the strongest individual contributor (C-index 0.693); the DeepInsight spatial encoding of gene expression yielded higher discrimination than a multilayer perceptron (MLP) baseline operating on the same features (0.624 vs. 0.596). Kaplan-Meier analysis showed significant prognostic group separation at all primary landmarks (log-rank p < 0.001; hazard ratios 3.46-3.93). A distilled student model retaining only the DeepInsight representation and five baseline clinical features achieved C-index 0.672 and tdAUC1yr 0.740 on an independent microarray cohort (GSE24080, n = 507) without retraining. Interpretability analysis identified prognostic associations consistent with established myeloma biology, including ubiquitin-proteasome pathway genes, endoplasmic reticulum stress markers, and Interferon Alpha Response pathway enrichment.

17
Structural Plausibility Without Binding Specificity: Limits of AI-Based Antibody-Antigen Structure Prediction Confidence Scores

Smorodina, E.; Ali, M.; Kropivsek, K.; Salicari, L.; Miklavc, S.; Kappassov, A.; Fu, C.; Sormanni, P.; de Marco, A.; Greiff, V.

2026-03-03 bioinformatics 10.64898/2026.03.02.709004 medRxiv
Top 0.1%
3.6%
Show abstract

Antibody-antigen binding prediction remains a central challenge for AI-driven therapeutic discovery, particularly in discriminating cognate interactions from structurally plausible but incorrect pairings. We present a controlled, AI-method- and antibody-format-agnostic evaluation framework that measures binding specificity under realistic conditions. Using 106 experimentally determined single-chain antibody (nanobody)-antigen complexes and 11,342 shuffled non-cognate pairings, we benchmarked publicly-available state-of-the-art structure prediction methods (AlphaFold3, Boltz-2, Chai-1). Although the methods tested often generated geometrically plausible complexes, internal confidence metrics (ipTM) frequently failed to discriminate correct from incorrect pairings. Increased sampling improved structural refinement but not pairing discrimination, indicating that computational resources are better allocated across independent seeds and explicit negative controls. We conclude that internal confidence scores are not inherently calibrated to binding specificity and require validation against realistic decoys. To enable community benchmarking and method development, we release [~]1.8 million AI-generated complex structures and guidance for the benchmarks ahead.

18
Optimizing broadly neutralizing antibodies via all-atom interaction modeling and pre-trained language models

Song, Y.; Wu, F.; Wang, R.; He, B.; Yan, Q.; Huang, X.; Chen, S.; Yuan, Q.; Rao, J.; Tang, Z.; He, H.; Zhao, J.; Yang, Y.; Yao, J.

2026-01-21 bioinformatics 10.64898/2026.01.20.700456 medRxiv
Top 0.1%
3.6%
Show abstract

Antibody optimization is a fundamental challenge, and the identification of antibody-antigen interactions is crucial in the optimization process. However, current methods cannot accurately predict antibody-antigen interactions due to the lack of all-atom modeling, thus being unable to improve the time-consuming and costly traditional optimization techniques. We present InterAb, a novel model developed for predicting antibody-antigen interactions and optimizing antibodies through all-atom modeling and antibody language models. Leveraging the proposed all-atom modeling approach, AtomInter, and pre-trained antibody language models, InterAb outperforms existing methods in predicting antibody specificity and antibody-antigen binding affinity. In the antibody library we constructed, InterAb successfully identified antibodies capable of binding to influenza A virus. An antibody optimization framework, InterAb-Opt, was further developed for the optimization of broadly neutralizing antibodies. For R1-32 antibody, biolayer interferometry results reveal that 85%, 80%, 90%, and 67.5% of the 40 optimized antibodies exhibit enhanced binding affinities to wild-type SARS-CoV-2, Lambda, BQ.1.1, and EG.5.1, respectively, with a maximum improvement of up to 96-fold. For the newly emerging BA.2.86 and KP.3, 55% and 52.5% of the optimized antibodies notably transition from non-binding to binding. Neutralization assays demonstrated that the optimized antibody exhibited enhanced neutralization activity across multiple targets, highlighting the capability of InterAb-Opt in engineering broadly neutralizing antibodies. This technology enables precise analysis of antibody-antigen interactions and optimization of broadly neutralizing antibodies, holding promise for addressing challenges in immune evasion and vaccine design.

19
Divergent disulfide bond architecture defines two IgY subclasses in snakes

Gambon Deza, F.

2026-02-12 immunology 10.64898/2026.02.11.705265 medRxiv
Top 0.1%
3.5%
Show abstract

Immunoglobulin Y (IgY) represents the major serum antibody in reptiles and birds, serving as the evolutionary precursor to mammalian IgG and IgE. While IgY diversification has been documented in several reptilian lineages, the structural basis underlying subclass divergence remains poorly understood. Here, we present a comprehensive phylogenetic and structural analysis of IgY sequences from 20 snake species, revealing two distinct evolutionary lineages (A and B) that arose through gene duplication. Structural modeling of the constant regions from Arizona elegans identified a fundamental difference in the light chain-heavy chain (CL-CH1) disulfide bond architecture between lineages. Lineage B utilizes CYS16 in the CH1 domain (alignment position 13) for the inter-chain disulfide bond with the light chain CYS98, whereas Lineage A employs CYS136 (alignment position 99), representing N-terminal versus C-terminal positioning within the CH1 domain. Analysis of 50 diagnostic amino acid positions between lineages revealed that changes are distributed across all constant domains (CH1-CH4), with 13 positions showing radical substitutions affecting charge or polarity. Sliding window dN/dS analysis demonstrated purifying selection ({omega} < 1) across both lineages, consistent with functional constraint following duplication. These findings provide structural evidence for subfunctionalization of snake IgY genes and suggest that alternative disulfide bond configurations may confer distinct biophysical or functional properties to each antibody subclass. This work advances our understanding of immunoglobulin evolution in reptiles and highlights the structural plasticity of antibody architecture.

20
Evaluating codon optimization strategies for mammalian glycoprotein production with an open-source expression vector

Yang, C.; Soni, R.; Visconti, S. E.; Abdollahi, M.; Belay, F.; Ghosh, A.; Duvall, S. W.; Walton, C. J. W.; Meijers, R.; Zhu, H.

2026-03-20 molecular biology 10.64898/2026.03.18.712111 medRxiv
Top 0.1%
3.2%
Show abstract

Efficient production of human proteins for the development of tool compounds and biologics depends on a detailed understanding of the protein expression machinery in mammalian cells. Codon optimization is widely believed to enhance protein yield, yet its impact in homologous mammalian systems remains poorly defined. Here, we systematically compare five codon usage strategies reflecting common assumptions about rare codons, RNA stability, and synthesis efficiency. We developed pTipi, an efficient open-source mammalian expression vector, and evaluated its performance in antibody production. We generated plasmids for common epitope tag antibodies such as V5, anti-biotin and anti-His for distribution by Addgene. To compare codon usage schemes, we performed a bake-off of 18 human and murine Wnt pathway glycoproteins in mammalian cells. Small-scale expression screens revealed that codon optimization did not provide a general advantage over native coding sequences, while strategies prioritizing RNA stability consistently reduced expression. Interestingly, a skewed codon scheme using the most abundant codons produced yields comparable to native sequences and occasionally enhanced protein output. To enable flexible evaluation of codon strategies, we implemented a Golden Gate-compatible pTipi platform for efficient synthetic gene incorporation. We conclude that native codons are sufficient for robust homologous mammalian expression of glycoproteins, while selective codon skewing can be beneficial for some targets.