Back

mAbs

Informa UK Limited

Preprints posted in the last 30 days, ranked by how well they match mAbs's content profile, based on 28 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
KyDab - a comprehensive database of antibody discovery selection campaigns.

Zhou, Q.; Chomicz, D.; Melvin, D.; Griffiths, M.; Yahiya, S.; Reece, S.; Le Pannerer, M.-M.; Krawczyk, K.

2026-03-27 bioinformatics 10.64898/2026.03.25.713450 medRxiv
Top 0.1%
26.3%
Show abstract

Preclinical antibody discovery relies on progressive screening and down-selection of candidate antibodies from large immune repertoires, yet this critical process is poorly represented in existing public databases. Here we introduce KyDab (Kymouse Antibody Database), a well-curated database of antibody discovery selection data generated using standardized workflows on the Kymouse humanized mouse platform. The current release includes 11 Kymouse platform mice immunisation studies covering 51 immunogens, more than 120,000 paired heavy-light chain sequences, and binding measurements for a selected subset of experimentally characterized clones. By capturing full-funnel selection data with consistent metadata and both positive and negative experimental outcomes, KyDab provides a valuable data resource for the development and evaluation of artificial intelligence models for antibody discovery. KyDab is accessible https://kydab.naturalantibody.com, and the database will be continuously updated as new datasets become available.

2
Surface Display For Phage Assisted Continuous Evolution: A Platform For Evolving / Screening Nanobodies In Prokaryote Systems

Flores-Mora, F. E.; Brodsky, J.; Cerna, G. M.; Tse, A.; Hoover, R. L.; Bartelle, B. B.

2026-04-04 synthetic biology 10.64898/2026.04.03.716437 medRxiv
Top 0.1%
9.9%
Show abstract

Despite >50 years of methods development, specific antibodies are still generated at low throughput and remain in high demand across biotechnology. Most biologics and immunoprobes are monoclonal antibodies, developed using a combination of inoculating animals with a target antigen, engineered candidate libraries, and multiple rounds of selection using phage or yeast display. Here we introduce a synthetic biology scheme to eliminate the need for nearly all of these steps, by combining Surface display on E. coli and Phage display with the microvirus {Phi}X174, Assisting Continuous Evolution (SurPhACE). Instead of building libraries for screening, SurPhACE runs a closed evolutionary program. A typical experiment can have 1011 mutant candidates under active selection, with complete turnover of the mutant population every 30min, or >5x1012 unique mutants per day, using less than 100mL of bacterial culture media. We demonstrate SurPhACE for optimizing a nanobody to a related epitope, and develop novel nanobodies for an arbitrary target using a minimal starting library to establish a proof of concept and identify best practices for this scalable method for generating protein binders.

3
Effects of protein interface mutations on protein quality and affinity

de Kanter, J. K.; Smorodina, E.; Minnegalieva, A.; Arts, M.; Blaabjerg, L. M.; Frolenkova, M.; Rawat, P.; Wolfram, L.; Britze, H.; Wilke, Y.; Weissenborn, L.; Lindenburg, L.; Engelhart, E.; McGowan, K. L.; Emerson, R.; Lopez, R.; van Bemmel, J. G.; Demharter, S.; Spreafico, R.; Greiff, V.

2026-03-26 molecular biology 10.64898/2026.03.24.713863 medRxiv
Top 0.1%
7.2%
Show abstract

Accurately modeling antibody-antigen interactions requires distinguishing intrinsic binding affinity ("protein-interaction") from protein biophysical properties ("protein-quality"), including folding, stability, and expression. However, high-throughput mutational measurements commonly used to train and benchmark computational models often conflate these effects, obscuring the true determinants of molecular recognition. Here, we present an experimental and analytical framework to disentangle protein-interaction effects from protein-quality effects in single-domain antibody (VHH)-antigen binding. Using a large-scale deep mutational scanning (DMS) dataset spanning four VHH-antigen complexes, with single and double mutations in both partners, we introduce control binders to quantify protein-quality changes independently of protein-interaction. This enables decomposition of experimentally measured affinity into protein-interaction and protein-quality components at scale. Leveraging the disentangled dataset, we evaluated state-of-the-art structure- and sequence-based models for protein-quality and protein-interaction prediction and show that their performance largely reflects protein-quality rather than protein-interaction effects. Our results highlight a major confounder in current datasets and suggest that accounting for protein-quality will be essential for training next-generation affinity-prediction models. Nomenclature Antibody related termsO_LIPrimary VHH: The VHH of a VHH-antigen complex for which the paratope and the epitope weremutated. C_LIO_LIControl VHH: A second VHH that binds to the same antigen as the primary VHH but has non-overlapping epitope positions and therefore does not bind to any of the mutated antigen positions. C_LI Affinity-related termsO_LIReal Affinity: "The strength of the interaction between two [...] molecules that bind reversibly (interact)" 1. In the context of antibody-antigen binding, it quantifies interactions between active proteins (which are expressed and correctly folded 2 and are therefore functionally and biologically active (see below). It is commonly quantified by the equilibrium dissociation constant, KD. C_LIO_LIObserved affinity ({degrees}KD): The interaction strength experimentally measured between two molecules. Unlike real affinity, this value is confounded by the biophysical properties of the individual binding partners, specifically their folding, stability, and expression levels. Consequently, the observed affinity often differs from the real/intrinsic affinity if a significant fraction of the protein population is inactive 3. NOTE: Unless otherwise specified, {degrees}KD is reported in - log10 space. For example, a {degrees}KD of -9 corresponds to 10-9M or 1nM. C_LIO_LIChange in observed affinity ({Delta}{degrees}KD): The shift in the observed affinity between two proteins upon mutation, reported as the log10-transformed fold change. A value of 1 reflects a 10-fold difference, a value of 2 a 100-fold difference, etc. This aggregate change resolves into two distinct biophysical components 2, 4: O_LIProtein-interaction change: The change in the intrinsic thermodynamic affinity between the two binding partners, each in its active state (i.e., the specific change in interface Gibbs free energy because both enthalpy and entropy are considered). C_LIO_LIProtein-quality change: The change in the fraction of the mutated protein population that is biologically active - meaning it is expressed, correctly folded, and stable 2, 5. O_LIFolding: The process that guides the polypeptide chain toward its native conformation, which is a prerequisite for forming a functional binding site. C_LIO_LIStability: The thermodynamic capacity to maintain the folded structure over time and under physiological conditions. Stability (decrease in Gibbs free energy from the unfolded to the folded state) ensures the binding interface remains intact and prevents competing processes such as aggregation 6. C_LIO_LIExpression: The steady-state abundance of the protein. This is largely dependent on proper folding and stability, as cellular quality control mechanisms degrade proteins that fail to fold or remain stable at functional concentrations. C_LI C_LI C_LIO_LIChange in relative affinity ({Delta}{Delta}{degrees}KD): the difference between the {Delta}{degrees}KD of the primary VHH compared to the control VHH for a given epitope mutation. C_LI Model-related termsO_LIESM-IF1 sc: Single-chain (sc) structure-conditioned inverse folding model (ESM-IF1), using the isolated monomer structure of the mutated protein: either the VHH or the antigen 7. C_LIO_LIESM-IF1 mc: Multi-chain (mc) structure-conditioned model (ESM-IF1), using the full complex structure (both antibody and antigen) 7. C_LIO_LIStability prediction score: Score that represents the predicted change in stability based on a single mutation, normally represented as {Delta}{Delta}G. C_LI

4
Structure-Guided Computational Analysis of Linker effects in an scFv Targeting Guanylyl Cyclase C

Melo, R.; Viegas, T.

2026-04-01 bioinformatics 10.64898/2026.03.30.714862 medRxiv
Top 0.1%
4.9%
Show abstract

Single-chain variable fragments (scFvs) are widely used in diagnostic and therapeutic applications. These antibody fragments comprise two antibody variable domains connected by a flexible peptide linker whose properties critically influence folding, stability, oligomeric state, and antigen-binding. Therefore, careful linker selection represents a key step in scFv design. Guanylyl Cyclase C (GUCY2C) is a tumor-associated cell surface receptor expressed in gastrointestinal malignancies, including more than 90% of colorectal cancer (CRC) cases across all disease stages. Its restricted physiological expression pattern makes GUCY2C an attractive target for immunotherapy and precision oncology therapies. Here, we investigated the structural and functional consequences of incorporating alternative linker designs into an anti-GUCY2C scFv. Using molecular modeling, protein-protein docking, and molecular dynamics (MD) simulations, we evaluated the conformational stability, interdomain organization, and antigen-binding interactions of each construct. Our results provide a dynamic, structure-based assessment of how linker composition influences GUCY2C recognition and scFv structural behavior. Furthermore, this work establishes a computational framework for the rational optimization of GUCY2C-targeted antibody fragments.

5
Dynamic multimodal survival prediction in multiple myeloma integrating gene expression, longitudinal laboratories, and treatment history

JIA, S.; Lysenko, A.; Boroevich, K. A.; Sharma, A.; Tsunoda, T.

2026-04-01 bioinformatics 10.64898/2026.03.30.715136 medRxiv
Top 0.1%
3.7%
Show abstract

Prognostic stratification in multiple myeloma (MM) relies on staging systems that assign patients to fixed categories at diagnosis and discard the temporal information that accumulates during treatment. We developed a dynamic multimodal framework that predicts residual overall survival using observation windows ranging from 1 to 18 months post-diagnosis. The model integrates DeepInsight-transformed gene expression representation, longitudinal laboratory measurement trajectories across 10 analytes, and treatment history for three drug classes through an adaptive fusion mechanism that accounts for missing clinical observations. On the MMRF CoMMpass cohort (n = 752), five-fold cross-validation yielded a concordance index (C-index) of 0.773 {+/-} 0.024 and a time-dependent AUC at a 1-year prediction horizon (tdAUC1yr) of 0.789 {+/-} 0.021, outperforming all evaluated baseline methods including DeepSurv (0.633 {+/-} 0.095) and random survival forests (0.636 {+/-} 0.024) on matched cross-validation splits. Modality ablation identified longitudinal laboratory measurements as the strongest individual contributor (C-index 0.693); the DeepInsight spatial encoding of gene expression yielded higher discrimination than a multilayer perceptron (MLP) baseline operating on the same features (0.624 vs. 0.596). Kaplan-Meier analysis showed significant prognostic group separation at all primary landmarks (log-rank p < 0.001; hazard ratios 3.46-3.93). A distilled student model retaining only the DeepInsight representation and five baseline clinical features achieved C-index 0.672 and tdAUC1yr 0.740 on an independent microarray cohort (GSE24080, n = 507) without retraining. Interpretability analysis identified prognostic associations consistent with established myeloma biology, including ubiquitin-proteasome pathway genes, endoplasmic reticulum stress markers, and Interferon Alpha Response pathway enrichment.

6
Evaluating codon optimization strategies for mammalian glycoprotein production with an open-source expression vector

Yang, C.; Soni, R.; Visconti, S. E.; Abdollahi, M.; Belay, F.; Ghosh, A.; Duvall, S. W.; Walton, C. J. W.; Meijers, R.; Zhu, H.

2026-03-20 molecular biology 10.64898/2026.03.18.712111 medRxiv
Top 0.1%
3.2%
Show abstract

Efficient production of human proteins for the development of tool compounds and biologics depends on a detailed understanding of the protein expression machinery in mammalian cells. Codon optimization is widely believed to enhance protein yield, yet its impact in homologous mammalian systems remains poorly defined. Here, we systematically compare five codon usage strategies reflecting common assumptions about rare codons, RNA stability, and synthesis efficiency. We developed pTipi, an efficient open-source mammalian expression vector, and evaluated its performance in antibody production. We generated plasmids for common epitope tag antibodies such as V5, anti-biotin and anti-His for distribution by Addgene. To compare codon usage schemes, we performed a bake-off of 18 human and murine Wnt pathway glycoproteins in mammalian cells. Small-scale expression screens revealed that codon optimization did not provide a general advantage over native coding sequences, while strategies prioritizing RNA stability consistently reduced expression. Interestingly, a skewed codon scheme using the most abundant codons produced yields comparable to native sequences and occasionally enhanced protein output. To enable flexible evaluation of codon strategies, we implemented a Golden Gate-compatible pTipi platform for efficient synthetic gene incorporation. We conclude that native codons are sufficient for robust homologous mammalian expression of glycoproteins, while selective codon skewing can be beneficial for some targets.

7
evedesign: accessible biosequence design with a unified framework

Hopf, T. A.; Gazizov, A.; Garcia Busto, S.; Eschbach, E.; Lee, S.; Mirdita, M.; Orenbuch, R.; Belahsen, K.; Ross, D.; Sander, C.; Steinegger, M.; d'Oelsnitz, S.; Marks, D.

2026-03-19 bioinformatics 10.64898/2026.03.17.712115 medRxiv
Top 0.1%
3.1%
Show abstract

Machine learning methods for protein engineering are rarely interoperable, require bespoke workflows, and remain inaccessible to non-experts. Yet the design problems that matter most - conditional design subject to real-world constraints, multi-objective optimization, and iterative lab-in-the-loop workflows where experimental data continuously refines successive design rounds - demand exactly the kind of flexible, composable infrastructure that no single tool provides. We present evedesign, a unified open-source framework that formalizes conditional biosequence design in a method-agnostic way, enabling complex multiobjective workflows combining supervised and unsupervised models from standardized specifications, and built from the outset to support iterative experimental integration. An interactive web interface facilitates end-to-end design for a broad scientific audience at https://evedesign.bio. We demonstrate evedesigns utility in antibody engineering, enzyme design, and natural enzyme discovery, and invite open-source community contributions.

8
Explainable protein-protein binding affinity prediction via fine-tuning protein language models

Singh, H.; SINGH, R. K.; Srivastava, S. P.; Pradhan, S.; Gorantla, R.

2026-04-01 bioinformatics 10.64898/2026.03.30.715237 medRxiv
Top 0.2%
1.7%
Show abstract

Predicting protein-protein binding affinity from sequence alone remains a bottleneck for anti-body optimization, biologics design and large-scale affinity modelling. Structure-based methods achieve high accuracy but cannot scale when complex structures are unavailable. Here we present a framework that reframes affinity prediction as metric learning: two proteins are projected into a shared latent space in which cosine similarity directly correlates with experimental binding affinity, and the protein language model encoder is adapted through parameter-efficient finetuning (PEFT). On the PPB-Affinity benchmark, the model achieves Pearson r = 0.89 on a random split, generalises to evolutionarily distant proteins (r = 0.61 at < 30% sequence identity) and surpasses structure-based deep learning baselines across biological subgroups, without any three-dimensional input. On the strictly de-overlapped AB-Bind dataset, few-shot adaptation with 30% of assay data (Pearson r = 0.756, RMSE = 0.688) out-performs methods trained on 90% of data; consistent gains are observed across nine diverse AbBiBench deep-mutational-scanning assays with 10-30% labelled variants. Residue-level explainability reveals that the model concentrates importance on interface-localised residues aligned with experimentally validated interaction hotspots across enzyme-inhibitor, and antibody-antigen systems. Together, these results establish a scalable, explainable and data-efficient route to protein-protein binding affinity prediction and therapeutic antibody optimisation from sequence alone.

9
De novo design of a peptide ligand for specific affinity purification of human complement C1q

Tsuchihashi, R.; Kinoshita, M.; Aino, H.

2026-04-01 bioinformatics 10.64898/2026.03.30.714096 medRxiv
Top 0.2%
1.7%
Show abstract

Affinity purification is a essential technique for isolating highly purified proteins; however, generating affinity ligands require significant time and financial investment. To address these limitations, this study proposes a novel affinity chromatography method utilizing in silico-designed cyclic peptides as ligands. Targeting Complement C1q (C1q), a plasma protein that plays crucial roles in classical complement pathway, we employed the biomolecular structure prediction model, AlphaFold2, to design specific binding cyclic peptides. Based on these designs, we synthesized lariat-type cyclic peptides characterized by disulfide cyclization and biotinylation, which were subsequently immobilized on streptavidin carriers. Performance tests confirmed that the resulting column specifically captured C1q, allowing for elution via a standard NaCl concentration gradient. Notably, high selectivity was preserved even in the presence of plasma, underscoring the ligands practical robustness. By overcoming traditional constraints through (1) rapid and simple design, (2) high specificity, and (3) universal versatility without genetic modification, this de novo design strategy represents a potential breakthrough in protein purification technologies. HighlightsO_LIAI-driven de novo design generated a specific cyclic peptide ligand for Complement C1q C_LIO_LIThe synthetic ligand enabled one-step purification of Complement C1q directly from human plasma C_LIO_LIMild elution conditions preserved the targets oligomeric structure and native interactome C_LIO_LIThis label-free strategy offers a rapid, low-cost alternative to antibody-based chromatography C_LI

10
GYDE: A collaborative drug discovery platform for AI-powered protein design and engineering

Down, T.; Warowny, M.; Walker, A.; DAscenzo, L.; Lee, D.; Zhou, Z.; Cao, S.; Bainbridge, T. W.; Nicoludis, J. M.; Harris, S. F.; Mukhyala, K.

2026-03-27 bioinformatics 10.64898/2026.03.24.714039 medRxiv
Top 0.2%
1.6%
Show abstract

As computational tools and machine learning models for protein sciences continue to advance and proliferate, bench scientists face increasing technical challenges adopting these tools for specific applications such as drug discovery. Here we present GYDE (Guide Your Design and Engineering), an open-source, versatile, and web-based collaboration platform designed to make computational analyses of proteins and antibodies easily accessible to bench scientists. GYDE enables the exploration of sequence-structure-function relationships through a tightly integrated visual interface, offering researchers a comprehensive exploration of protein functional determinants either via real assay data or computational tools. GYDEs intuitive interface facilitates seamless access to cutting-edge AI models for protein and antibody structure prediction, design, and downstream analyses. The flexible and easy addition of new tools and models is facilitated by the use of the Slivka compute API. The platform supports saved sessions that enable researchers to easily share their findings with other users, fostering a more collaborative scientific community. GYDE is freely available for protein scientists in academia and industry to build drug discovery analytics platforms customized to their needs.

11
Efficient generation of epitope-targeted de novo antibodies with Germinal

Mille-Fragoso, L. S.; Driscoll, C. L.; Wang, J. N.; Dai, H.; Widatalla, T. M.; Zhang, J. L.; Zhang, X.; Rao, B.; Feng, L.; Hie, B. L.; Gao, X. J.

2026-04-15 synthetic biology 10.1101/2025.09.19.677421 medRxiv
Top 0.2%
1.6%
Show abstract

Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.

12
A Cross-Study Multi-Organ Cell Atlas ofMacaca fascicularis Informed by Human Foundation Model Annotation: A Resource for Translational Target Assessment

Souza, T. M.; Gamse, J. T.; Moreno, L.; van Rumpt, M.; Nunez-Moreno, G.; Khatri, I.; van Asten, S. D.; Khusial, N. V.; Baltasar-Perez, E.; Adhav, R.; Abdelaal, T.; Wojtuszkiewicz, A.; Calis, J. J. A.; Csala, A.; Dahlman, A.; Fuller, C. L.; Thalhauser, C. J.; Kolder, I. C. R. M.

2026-03-19 bioinformatics 10.64898/2026.03.17.711997 medRxiv
Top 0.2%
1.5%
Show abstract

Non-human primates (NHPs), particularly Macaca fascicularis (cynomolgus macaque), represent an essential model for preclinical assessment of biologics due to their high genetic and physiological similarity to humans. However, mounting regulatory pressure to reduce NHP use and the lack of a unified, well-annotated single-cell atlas currently limits both target qualification and mechanistic interpretation of toxicity in this species. To address this gap, we assembled and harmonized the largest single-cell transcriptomic atlas of M. fascicularis to date, integrating 30 publicly available studies spanning 57 anatomical regions, 43 organs and 14 physiological systems. We implemented a scalable framework for cross-species cell type annotation by embedding both cynomolgus monkeys and human (Tabula Sapiens V2) datasets into a shared reference space using Universal Cell Embeddings (UCE), enabling consistent harmonization of cell identities. In total, 27 organs were annotated using human reference labels, while the remaining sets retained author-provided annotations or labels transferred from other cynomolgus studies with available annotations. The resulting atlas comprises over 2.5 million cells and demonstrates concordance in cell-type-specific expression patterns between cynomolgus and humans, including tissue-specific markers and targets relevant for biologics development. Through translational use cases, we illustrate how this resource can be applied to assess target expression in tissues affected by concordant human-NHP toxicities, investigate ocular adverse events associated with antibody-drug conjugates (ADCs), and identify species-specific features of immune cell subtypes with known safety implications. By enabling scalable, high-resolution, cross-species comparisons of gene expression across organs, tissues, and cell states, this atlas supports improved target qualification, more mechanistic interpretation of toxicities, and evidence-based decisions on the relevance and design of NHP studies. Collectively, this work provides a unified cross-species single-cell resource for cynomolgus monkey and a modular computational framework that advances new approach methodologies and contributes to the refinement and reduction of NHP use in preclinical research.

13
IMMREP25: Unseen Peptides

Richardson, E.; Aarts, Y. J. M.; Altin, J. A.; Baakman, C. A. B.; Bradley, P.; Chen, B.; Clifford, J.; Dhar, M.; Diepenbroek, D.; Fast, E.; Gowthaman, R.; He, J.; Karnaukhov, V.; Marzella, D. F.; Meysman, P.; Nielsen, M.; Nilsson, J. B.; Deleuran, S. N.; Parizi, F. M.; Pelissier, A.; Pierce, B. G.; Rodriguez Martinez, M.; Roran A R, D.; Saravanakumar, S.; Shao, Y.; Smit, N.; Van Houcke, M.; Visani, G. M.; Wan, Y.-T. R.; Wang, X.; Woods, L.; Wuyts, S.; Xiao, C.; Xue, L. C.; IMMREP25 Participant Consortium, ; Barton, J.; Noakes, M.; May, D. H.; Peters, B.

2026-04-01 bioinformatics 10.64898/2026.03.30.715276 medRxiv
Top 0.2%
1.3%
Show abstract

T cell receptors (TCRs) can bind to peptides presented by MHC molecules (pMHC) as a first step to trigger a T cell response. Reliable approaches to predict TCR:pMHC binding would have broad applications in clinical diagnostics, therapeutics, and the fundamental understanding of molecular interactions. IMMREP is a community organized series of prediction contests that asks participants to predict TCR:pMHC binding on unpublished datasets. Previous iterations in 2022 and 2023 showed multiple approaches can predict TCR-pMHC binding with significant accuracy (median AUC_0.1[&ge;]0.7) for peptides where experimental data is available ("seen" peptides). In contrast, models did not outperform random guessing for peptides that have no such data available ("unseen" peptides). Here we report on the results of IMMREP25, which focused solely on unseen peptides in order to evaluate the cutting edge of the field. We received 126 named submissions predicting the specificity of 1,000 TCRs against twenty unseen peptides restricted by one of two MHC molecules (HLA-A*02:01 and HLA-B*40:01). The best performing methods showed a macro-AUC_0.1 of 0.60, significantly better than random, demonstrating significant advances in the field. The top performing methods incorporated structural modeling into their approach, indicating that especially for unseen peptides, a structural understanding aids in the prediction of TCR:pMHC interactions. The results from this benchmark highlight the significant challenges remaining for TCR:pMHC predictions and will inform future method development.

14
Cirrina: LLM-driven pharmacological reasoning agent enables preclinical CNS drug evaluation

Rajbanshi, B.; Iqbal, K.; Guruacharya, A.

2026-03-31 pharmacology and toxicology 10.64898/2026.03.29.713781 medRxiv
Top 0.2%
1.3%
Show abstract

Assessing whether a preclinical drug candidate will work is not a prediction problem but a reasoning problem. The same numerical output warrants different interpretations depending on the target and therapeutic context. CNS drug development presents the most demanding instance of this reasoning problem. For example, a compound must cross the blood-brain barrier, resist efflux transport, and achieve adequate receptor occupancy at a dose that clears safety margins. The constraints interact with each other in a web that needs careful interpretation. Here, we show that Cirrina, an LLM agent coupled to eight mechanistic pharmacology tools, can reason across the input data to provide better decisions and a well documented reasoning trace. The LLM agent reasons across multiple data tiers from SMILES to animal PK/PD measurements adjusting thresholds based on target-specific requirements. Validated against 181 CNS compounds, it achieved a 68% accuracy compared to a rule-based deterministic pipeline of 31% accuracy. In 103 discordant cases, the agents reasoning was correct in 75% of instances compared to only 10% for deterministic pipelines. Cirrina provides a scalable, documented framework for preclinical decision-making, effectively identifying failure-prone candidates that generic thresholds overlook, and thereby reducing the chances of failure along the clinical development cycle.

15
Comprehensive characterization of V(D)J recombination from long-read transcriptomic data with VDJcraft

Hu, K.; Rosenberg, A. F.; Song, Y.; Fan, C.-H.; Peng, Z.; Gao, M.; Chong, Z.

2026-04-05 bioinformatics 10.64898/2026.04.01.715879 medRxiv
Top 0.2%
1.2%
Show abstract

V(D)J recombination generates antigen receptor diversity in developing B and T cells. Long-read transcriptome technologies (e.g., PacBio Iso-Seq, Nanopore RNA/cDNA) capture full-length transcripts and thus resolve V(D)J events more accurately than short-read platforms. However, existing short-read tools are not applicable to or optimized for long-read data. We developed VDJcraft, the first integrated pipeline designed for V(D)J recombination analysis using long-read transcriptome sequencing data. The workflow uses a two-pass alignment strategy: global alignment to the GENCODE reference with minimap2, followed by local realignment and annotation using the international ImMunoGeneTics information system (IMGT). A customized module enhances D-gene detection sensitivity and positional precision. Sequencing errors are reduced through consensus-based correction toward the predominant subclass. Antigen-binding regions are annotated using IMGT-defined motifs to characterize CDRs and binding site composition. VDJcraft was validated on simulated and Human Genome Structural Variation Consortium (HGSVC) datasets and applied to disease datasets. It accurately recovered full-length V(D)J-C sequences and outperformed existing methods in gene detection and recombination accuracy. Long-read calls also showed significantly higher concordance with high-confidence short-read calls (Mann-Whitney U test, p = 1.55 x 10-4). Additionally, we identified 31 putative novel gene subclasses absent from the IMGT database from HGSVC datasets. Analyses of longitudinal blood samples from a COVID-19 patient revealed distinct V(D)J recombination patterns and segment enrichment, characterized by increased IGHV1-2 usage, enrichment of the IGHV3-7/IGHD6-9/IGHJ5_02 rearranged clonotype, and a transient peak in IgG2 levels at day 4 followed by a gradual return to baseline. In conclusion, VDJcraft provides a robust framework for long-read V(D)J characterization and enables the discovery of disease-associated immune signatures.

16
Influence of transglutaminase mediated crosslinking on the structure-function-digestion properties of Lupinus angustifolius protein evaluated using a multiscale approach

Mukherjee, A.; Duijsens, D.; Faeye, I.; Weiland, F.; Grauwet, T.; Van de Voorde, I.

2026-03-20 bioengineering 10.64898/2026.03.18.712645 medRxiv
Top 0.3%
0.9%
Show abstract

This study presents a multidisciplinary approach to evaluate the structure formation and digestion of lupin protein crosslinked with transglutaminase (TG). TG was applied at 0-10 U/g protein, and structural development was assessed by oscillatory rheology (G, G"), while SDS-PAGE and o-phthaldialdehyde (OPA) assays were used to evaluate protein participation and the reduction of free {varepsilon}-amino groups, respectively. Proteomics was further employed to characterise molecular features associated with crosslinking behaviour. Lupin protein showed a clear dose-dependent increase in gel strength during incubation, with G values reaching 214 {+/-} 43.9 Pa at 10 U/g TG, compared to 7.2 {+/-} 0.6 Pa in the untreated control. Across all conditions, G remained higher than G" throughout frequency sweeps, and low tan {delta} values confirmed the formation of elastic networks driven by covalent crosslinks. SDS-PAGE and OPA results consistently demonstrated efficient crosslink formation, which increased with both incubation time and TG dosage, with SDS-PAGE indicating involvement of specific protein fractions. Proteomic analysis revealed disordered structural domains in the protein are preferred regions to form crosslinks. Furthermore, TG treatment was found to slow the digestibility of the crosslinked lupin protein. Overall, this work demonstrates how integrating proteomic insights with functional measurements can guide the selection and optimisation of plant proteins for enzymatic structuring. The approach offers a rational pathway to enhance the functionality of alternative protein sources such as lupin, supporting the development of sustainable food systems, including applications in meat and dairy analogues.

17
ViralMap: Predicting Features in Viral Proteins from Primary Sequence

Dwivedi, S.; Kar, S.; Horton, A. P.; Gollihar, J. D.

2026-04-09 bioinformatics 10.64898/2026.04.07.716565 medRxiv
Top 0.3%
0.9%
Show abstract

Modern viral vaccines are designed to elicit an immune response against viral proteins that mediate infection, making those proteins important targets for characterization and engineering. To improve vaccine efficacy, the proteins often require changes to specific residues or domains to improve immunogenicity and induce a protective response. These engineering strategies vary significantly across viruses, and comprehensive and accurate protein sequence annotation is crucial for guiding vaccine design. The growing risk of novel pathogen emergence and initiatives such as the CEPI 100 Days Mission to rapidly counter "Disease X" threats heighten the need for tools that can convert viral protein sequences from newly characterized genomes or emerging variants into the annotation profiles required for antigen engineering. To address this, we developed ViralMap, a multi-label annotation model tailored for eukaryotic viral proteins. By leveraging ESM-2 language model representations, ViralMap simultaneously predicts ten distinct annotation classes spanning domain topology and localization, post-translational modifications, and structural features directly from primary sequences. The model achieves a residue-level precision-recall area under the curve (PR-AUC) of 0.75 or greater for seven of the ten classes and realizes predictive performance competitive with established tools across the eight benchmarked classes. Case studies on complex glycoproteins, including the SARS-CoV-2 spike and HIV-1 Env, illustrated the models ability to generalize across viral strains and to novel viral families not seen during training. By providing a unified, sequence-based framework for multi-label annotation, ViralMap offers a practical and scalable bridge from raw viral protein sequences to the annotation profiles required for antigen engineering.

18
Strategic template filtering accelerates fragment-based peptide docking

Trabelsi, N.; Varga, J. K.; Khramushin, A.; Lyskov, S.; Schueler-Furman, O.

2026-03-30 bioinformatics 10.64898/2026.03.26.714397 medRxiv
Top 0.3%
0.9%
Show abstract

Peptide-protein interactions are often transient and structurally elusive, necessitating computational approaches to identify both binding sites and peptide conformations. PatchMAN, one of the leading but computationally expensive biophysic-based global peptide-docking protocols, addresses this challenge by treating peptide docking as a protein-folding problem, using structural motifs from solved structures as templates that are subsequently refined using Rosetta FlexPepDock. Here we present PatchMAN2, which introduces 1) strategic fragment filtering and 2) local docking modes that focus sampling on relevant surfaces or known binding regions, thereby reducing the high computational cost of the original implementation due to extensive refinement of many non-productive low-quality fragments. Benchmarking shows that PatchMAN2 removes [~]30-70% of unnecessary fragments while preserving accuracy, substantially reducing runtime and improving the practical efficiency of peptide-protein docking.

19
Cytotoxicity-based High-throughput Screening System for CAR T Cell

Okuma, A.; Ishida, Y.; Miura-Yamashita, T.; Kawara, T.; Ito, D.; Yoshida, K.; Mimura, S.; Nakao, Y.; Iwamoto, T.; Hisada, S.; Takeda, S.

2026-03-31 synthetic biology 10.64898/2026.03.31.715247 medRxiv
Top 0.3%
0.9%
Show abstract

Some chimeric antigen receptor (CAR) T cell therapies have shown strong clinical efficacy, yet systematic screening of new CAR designs remains constrained by labor-intensive, low-throughput evaluation methods. To address this limitation, we developed a cytotoxicity-centered, high-throughput screening platform that integrates single-cell pooled screening with fully automated arrayed screening enabling both large-scale library handling and quantitative functional resolution for systematic CAR design exploration. Using a mutation-based CAR design approach guided by protein fitness prediction, we generated a 4-1BB-based CAR library with approximately 10 theoretical variants while minimizing the prevalence of low-activity designs. In pooled screening, CAR T cells were evaluated at the single-cell level based on cytotoxicity and proliferation, enabling rapid enrichment of high-performing variants from a highly diverse library. Subsequent automated arrayed screening quantitatively measured cytotoxicity with high reproducibility, providing high-resolution functional data suitable for comparative ranking. Selected CAR variants demonstrated superior antitumor efficacy in a leukemia xenograft model compared with a template CAR. Furthermore, systematic analysis of mutation sites from an enhanced CAR variant identified essential mutation combinations underlying functional enhancement. Together, this study establishes a cytotoxicity-focused screening framework that provides a robust approach for optimizing CAR architectures and accelerating the development of CAR T-cell therapies.

20
CROWN: Curated Repository Of Well-resolved Noncovalent interactions

Poelmans, R.; Van Eynde, W.; Bruncsics, B.; Bruncsics, B.; Arany, A.; Moreau, Y.; Voet, A. R.

2026-04-01 bioinformatics 10.64898/2026.03.30.714168 medRxiv
Top 0.3%
0.8%
Show abstract

AbstractThe development of machine learning models for protein-ligand interactions is fundamentally constrained by the quality and diversity of available structural data. Existing databases of protein-ligand complexes present researchers with an unsatisfying trade-off: carefully curated collections such as PDBBind and HiQBind offer high structural reliability but cover only a narrow slice of the Protein Data Bank (PDB), while large-scale resources like PLInder provide broad coverage at the expense of rigorous quality control. Here, we introduce CROWN (Curated Repository Of Well-resolved Non-covalent interactions), a machine learning-ready dataset that reconciles this tension by applying a comprehensive, fully automated preprocessing pipeline to the PLInder database. Starting from 649,915 protein-ligand interaction systems, CROWN applies a series of interleaved quality filters and processing stages addressing crystallographic resolution, ligand identity, pocket completeness, structural repair, interaction quality, and protonation at physiological pH. A distinguishing feature of the pipeline is a final constrained energy minimisation step using custom flat-bottomed restraints, which balances crystallographic evidence with relaxation of intramolecular strain. This step -- absent from existing protein-ligand datasets -- produces structurally uniform complexes by reconciling the heterogeneous refinement practices of different crystallographers and structure determination protocols, without distorting the experimentally observed binding geometry. The resulting dataset of 153,005 complexes represents a roughly four-fold increase in protein and species diversity over PDBBind and HiQBind, while maintaining rigorous structural standards. Importantly, CROWN adopts a geometry-centric design philosophy that treats the 3D arrangement of atoms at the binding interface as a self-consistent source of information, rather than relying on externally measured binding affinities that cover only a fraction of known structures and introduce well-documented biases. We anticipate that CROWN will serve as a broadly useful resource for training generative models of protein-ligand binding poses, developing scoring functions, and benchmarking interaction prediction methods.