Back

ImmunoInformatics

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match ImmunoInformatics's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
IMMREP25: Unseen Peptides

Richardson, E.; Aarts, Y. J. M.; Altin, J. A.; Baakman, C. A. B.; Bradley, P.; Chen, B.; Clifford, J.; Dhar, M.; Diepenbroek, D.; Fast, E.; Gowthaman, R.; He, J.; Karnaukhov, V.; Marzella, D. F.; Meysman, P.; Nielsen, M.; Nilsson, J. B.; Deleuran, S. N.; Parizi, F. M.; Pelissier, A.; Pierce, B. G.; Rodriguez Martinez, M.; Roran A R, D.; Saravanakumar, S.; Shao, Y.; Smit, N.; Van Houcke, M.; Visani, G. M.; Wan, Y.-T. R.; Wang, X.; Woods, L.; Wuyts, S.; Xiao, C.; Xue, L. C.; IMMREP25 Participant Consortium, ; Barton, J.; Noakes, M.; May, D. H.; Peters, B.

2026-04-01 bioinformatics 10.64898/2026.03.30.715276 medRxiv
Top 0.1%
14.0%
Show abstract

T cell receptors (TCRs) can bind to peptides presented by MHC molecules (pMHC) as a first step to trigger a T cell response. Reliable approaches to predict TCR:pMHC binding would have broad applications in clinical diagnostics, therapeutics, and the fundamental understanding of molecular interactions. IMMREP is a community organized series of prediction contests that asks participants to predict TCR:pMHC binding on unpublished datasets. Previous iterations in 2022 and 2023 showed multiple approaches can predict TCR-pMHC binding with significant accuracy (median AUC_0.1[≥]0.7) for peptides where experimental data is available ("seen" peptides). In contrast, models did not outperform random guessing for peptides that have no such data available ("unseen" peptides). Here we report on the results of IMMREP25, which focused solely on unseen peptides in order to evaluate the cutting edge of the field. We received 126 named submissions predicting the specificity of 1,000 TCRs against twenty unseen peptides restricted by one of two MHC molecules (HLA-A*02:01 and HLA-B*40:01). The best performing methods showed a macro-AUC_0.1 of 0.60, significantly better than random, demonstrating significant advances in the field. The top performing methods incorporated structural modeling into their approach, indicating that especially for unseen peptides, a structural understanding aids in the prediction of TCR:pMHC interactions. The results from this benchmark highlight the significant challenges remaining for TCR:pMHC predictions and will inform future method development.

2
UnivAIRRse: A Unified Framework for Organizing and Comparing Adaptive Immune Receptor Repertoire Simulators

Abdollahi, N.; Kaveh, S.; Shayesteh, S.; Mommahed, S.; Alemzadeh, Y.; Zarrin, R.; Chaker Hosseini Zavareh, F.; Esmaeili, P.; Hassanzadeh, R.; Kossida, S.; Eslahchi, C.

2026-02-19 bioinformatics 10.64898/2026.02.19.706510 medRxiv
Top 0.1%
13.9%
Show abstract

Adaptive immune receptor repertoire sequencing (AIRR-seq) enables large-scale profiling of B- and T-cell receptor diversity and has become a cornerstone of modern computational immunology. However, AIRR-seq provides only a partial and lossy molecular snapshot of immune dynamics, lacking explicit ground truth for clonal ancestry, lineage trajectories, antigen specificity, and longitudinal immune evolution. This limitation complicates benchmarking, method validation, and mechanistic interpretation of repertoire analysis pipelines. Here, we introduce UnivAIRRse, a unified hierarchical framework that organizes AIRR simulators within a shared conceptual coordinate system spanning five operational levels, from observed sequence data to the theoretical generative potential of the adaptive immune system. By explicitly distinguishing sequence-, clonal-, specificity-, repertoire-, and generative-level representations, UnivAIRRse enables systematic comparison of simulator assumptions, biological scope, abstraction level, and application focus. To our knowledge, this is the first review to formalize such a unified structure across biological, computational, and functional layers of AIRR simulation. Using this framework, we review how simulation supports benchmarking, strengthens computational inference, and enables multi-scale investigation of immune repertoire formation and evolution. We identify persistent limitations in existing simulators, including incomplete biological context, limited modularity, restricted interoperability, and overreliance on AIRR-seq as a molecular proxy for complex spatiotemporal immune processes. To operationalize this framework, we provide an interactive web-based AIRR Simulation Landscape Explorer (publicly available at https://www.imgt.org/AIRR-Simulator/) that enables dynamic filtering and comparison of simulators across biological scope, abstraction level, output fidelity, and application focus. Finally, we outline emerging directions toward digital-twin-ready immune simulation, emphasizing modular architectures, longitudinal multi-omic integration, uncertainty quantification, and dynamic model updating. By providing a coherent conceptual and operational coordinate system, UnivAIRRse establishes a foundation for reproducible, interpretable, and clinically actionable modeling of adaptive immune repertoires, bridging current simulation practices with the next generation of predictive and personalized immunological modeling. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=133 SRC="FIGDIR/small/706510v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@7a5a95org.highwire.dtl.DTLVardef@d127f1org.highwire.dtl.DTLVardef@19545c9org.highwire.dtl.DTLVardef@118cc74_HPS_FORMAT_FIGEXP M_FIG C_FIG

3
AI predicted TCR-pMHC structures differentiate immune interactions

Robben, M. W.

2026-02-26 immunology 10.64898/2026.02.24.707744 medRxiv
Top 0.1%
10.3%
Show abstract

The T Cell Receptor (TCR) is a highly variable component of the T cell immune response that recognizes unique epitopes presented on MHC molecules (pMHC). Random genetic recombination limits the ability for sequence homology to predict epitope specificity, which is more dependent on the strength of the TCR-pMHC binding interaction. Structures for understanding this interaction only exist for well characterized positive interactors, and there is no information available about the physical interaction of non-specific TCR-pMHCs. In this study, we explore the ability for structural prediction algorithms to generate interacting and non-interacting multimeric TCR-pMHC structures, then, examine features that can predict immune interaction. AlphaFold2 shows more consistent multimeric structure prediction compared to other deep learning structure generators or template based algorithms. Poor structure generation does not correlate with immune interaction, and non-interacting structures show similar structural properties to interacting structures. However, this results in less energetically stable conformations in non-interacting structures. Molecular dynamic simulation supports this finding and reveals a novel structural conformation that contributes mechanistically to proper immune synapse. We show that structural and physical features extracted from generated structures are more predictive of interaction than sequence based features. To support researchers in the prediction of TCR-epitope specificity we have made our structural prediction models available through an accessible notebook based webserver: https://github.com/RobbenLab/TCRSIP.

4
Computational Development of a GluN1 Synthetic Peptide Mimetic for Neutralization of Autoantibodies in Anti-NMDAR Autoimmune Encephalitis

Misra, P.; Movva, N. S. V.; Shah, R.

2026-03-30 bioinformatics 10.64898/2026.03.26.714496 medRxiv
Top 0.1%
6.3%
Show abstract

Purpose/ObjectiveThis study aimed to design and computationally evaluate a synthetic GluN1-mimetic peptide as a decoy to bind and neutralize pathogenic autoantibodies in anti-NMDA receptor (NMDAR) encephalitis, a severe autoimmune neurological disorder affecting approximately 1.5 per million individuals annually. MethodsKey GluN1 epitope residues (351-390 of the amino-terminal domain) were identified from crystallographic evidence and patient-derived antibody binding studies. Multiple peptide variants were rationally designed to mimic the antibody-binding interface. AlphaFold2 was used to predict peptide structures. Rigid-body docking simulations were conducted with HADDOCK 2.4 to model peptide-antibody complexes, and binding affinities were quantified using PRODIGY. A scrambled peptide control was included to establish docking specificity. ResultsThe top-performing peptide demonstrated favorable predicted binding ({Delta}G = -21.5 kcal/mol, Kd = 1.7 x 10-{superscript 1} M) with an average pLDDT score of 90%, a buried surface area of 3,255.5 [A]{superscript 2}, and 18 intermolecular hydrogen bonds. Relative to the scrambled control ({Delta}G = -8.3 kcal/mol), the designed peptide showed substantially stronger predicted binding. Conclusion/ImplicationsThese results support the validity of an epitope-mimicry design strategy and establish a scalable computational framework for prioritizing peptide decoy candidates applicable to other antibody-mediated autoimmune disorders. Experimental validation remains necessary to confirm real-world efficacy.

5
KyDab - a comprehensive database of antibody discovery selection campaigns.

Zhou, Q.; Chomicz, D.; Melvin, D.; Griffiths, M.; Yahiya, S.; Reece, S.; Le Pannerer, M.-M.; Krawczyk, K.

2026-03-27 bioinformatics 10.64898/2026.03.25.713450 medRxiv
Top 0.1%
4.9%
Show abstract

Preclinical antibody discovery relies on progressive screening and down-selection of candidate antibodies from large immune repertoires, yet this critical process is poorly represented in existing public databases. Here we introduce KyDab (Kymouse Antibody Database), a well-curated database of antibody discovery selection data generated using standardized workflows on the Kymouse humanized mouse platform. The current release includes 11 Kymouse platform mice immunisation studies covering 51 immunogens, more than 120,000 paired heavy-light chain sequences, and binding measurements for a selected subset of experimentally characterized clones. By capturing full-funnel selection data with consistent metadata and both positive and negative experimental outcomes, KyDab provides a valuable data resource for the development and evaluation of artificial intelligence models for antibody discovery. KyDab is accessible https://kydab.naturalantibody.com, and the database will be continuously updated as new datasets become available.

6
Unsupervised identification of low-frequency antigen-specific TCRs using distance-based anomaly scoring

Kinoshita, K.; Kobayashi, T. J.

2026-03-11 bioinformatics 10.64898/2026.03.09.709174 medRxiv
Top 0.1%
3.9%
Show abstract

Identifying antigen-specific T cell receptors (TCRs) within the diverse human repertoire remains challenging due to their extremely low frequencies, often as rare as one per million cells. Here, we propose a novel unsupervised approach that detects low-frequency antigen-specific TCRs through distance-based anomaly detection in TCR sequence space. Our method is based on the observation that antigen-specific TCRs preferentially localize at the periphery of V gene clusters rather than cluster centers. Using TCRdist3 to quantify sequence distances, we identify query TCRs that are anomalous compared to reference repertoires within their V-J gene combinations. We validated this approach across three immunological contexts: COVID-19 infection, influenza vaccination, and yellow fever vaccination. For SARS-CoV-2-specific TCR detection in a COVID-19 patient, our method demonstrated 34.3% accuracy, significantly outperforming similarity-based (ALICE: 8.0%) and frequency-based methods (edgeR: 5.8%, the Pogorelyy method: 6.3%), and uniquely detected low-frequency antigen-specific TCRs at clone count one. The minimal overlap with conventional approaches ([≤]6.7%) indicates our method captures distinct TCR clones overlooked by existing analyses. This spatial distribution-based paradigm provides a complementary strategy for TCR specificity detection, particularly valuable for identifying rare antigen-specific clones essential for understanding immune responses.

7
Structure-Based TCR-pMHC Binding Prediction and Generalization to Unseen Peptides

Abeer, A. N. M. N.; Roy, R. S.; Qian, X.; Yoon, B.-J.

2026-02-23 bioinformatics 10.64898/2026.02.21.707231 medRxiv
Top 0.1%
3.7%
Show abstract

The interaction between T-cell receptors (TCRs) with the peptide-bound major histocompatibility complex (MHC) intricately impacts the functional specificity of T-cell-mediated adaptive immune response. Consequently, implication in immunotherapy has contributed to the ever-growing computational methods for TCR recognition, which have recently attracted structure-based approaches due to advancements in protein structure modeling. Despite access to structural information of the predicted binding interface, graph neural network (GNN)-based TCR-pMHC binding specificity classifiers tend to show poor accuracy for samples with unseen peptides. In this work, we comprehensively assess the potential factors that critically impact the generalization performance of classifiers trained with computationally predicted structures. Specifically, our experiments focus on analyzing the sensitivity of such predictors to the interaction features in the TCR-pMHC interface and the structural uncertainty. Building on the analysis, we demonstrate how the design of classifier architecture with auxiliary training objectives can improve the generalization performance to novel peptides not yet seen during model training. Overall, our work highlights the challenges of unseen peptide generalization from different perspectives of the GNN-based classifier paradigm, showcasing the strengths and weaknesses of the current state-of-the-art approaches in the generalization landscape.

8
PepCABO: Latent-space Bayesian optimization for peptide-MHC binding using contrastive alignment

Ghane, M.; Korpela, D.; Dumitrescu, A.; Lähdesmäki, H.

2026-03-16 bioinformatics 10.64898/2026.03.13.711540 medRxiv
Top 0.1%
3.7%
Show abstract

MotivationOptimizing peptide sequences for binding to specific MHC class I alleles is a central challenge in immunotherapy and vaccine design. The combinatorial size of peptide space, the nonlinear nature of peptide- MHC interactions, and limited experimental budgets make efficient optimization difficult. Latent-space Bayesian optimization (LSBO) provides a framework by embedding discrete sequences into a continuous space where Bayesian optimization can be applied. However, existing LSBO methods do not effectively leverage binding data from related alleles and often rely on inefficient random initialization. ResultsWe propose PepCABO, an LSBO framework for peptide-MHC binding using contrastive alignment, which utilizes a dual variational autoencoder framework that jointly learns peptide-allele alignment and a Gaussian process surrogate prior to Bayesian optimization. This simultaneous training induces a latent geometry that reflects the binding landscape and enables structured knowledge transfer across alleles. The pretrained model shapes a structured latent space in which peptides with high objective values regarding a specific MHC allele are geometrically organized, while the jointly trained Gaussian process defines an informative prior over the objective in this space, enabling principled and efficient exploration of promising regions during subsequent optimization. Across 12 target alleles without prior binding data and under both low- and high-budget settings, PepCABO consistently outperforms various baselines. We observe faster convergence, improved area under the optimization curve, and stronger best-found binding affinities, suggesting improved sample efficiency under experimentally constrained scenarios. Code availabilityThe source code is available at https://github.com/mohsen-g/PepCABO

9
Applications of T-cell receptor specificity annotation models for quality control and immunomonitoring in adoptive T-cell therapies

Bosschaerts, T.; Van Houcke, M.; Meysman, P.; Wuyts, S.; Harari, A.; Chiffelle, J.; Auger, A.; Coukos, G.

2026-01-28 bioinformatics 10.64898/2026.01.26.701708 medRxiv
Top 0.1%
3.6%
Show abstract

Adoptive cell therapy (ACT) with tumor-infiltrating lymphocytes (TIL) is a powerful candidate immunotherapy. However, the personalized production process that underlies its potential also introduces an element of stochasticity, where the actual target of the TILs is variable across patients and protocols, and remains largely unknown. In this letter, we describe the application of T-cell receptor (TCR) sequencing in combination with computational tools for annotating epitope-specificity of the TCR repertoire of TIL products as a potent quality control. We highlight the potential of this approach by demonstrating the tumor antigen-specificity of TCR clusters within the TIL product in silico, and validating these responses in vitro. We also demonstrate screening for off-cancer reactivity against viral antigens, and immune monitoring of identified TCR clones. HighlightsO_LIT-cell receptor sequencing combined with epitope annotation tools are viable for quality control of TIL products by identifying tumor-specific clones. C_LIO_LIWe showed the additional utility of screening for off-cancer reactivity, such as viral antigens. C_LIO_LIThe identified tumor-specific TCR clones can be tracked in vivo providing a means of immune monitoring. C_LI

10
An Improved Dataset for Predicting Mammal Infecting Viruses from Genetic Sequence Information

Reddy, T.; Schneider, A.; Hall, A. R.; Witmer, A.; Hengartner, N.

2026-01-25 bioinformatics 10.1101/2025.09.17.676952 medRxiv
Top 0.1%
2.6%
Show abstract

There have been several attempts to develop machine learning (ML) models to identify human infecting viruses from their genomic sequences, with varying degrees of success. Direct comparison between models is problematic, because these models are typically trained and evaluated on different datasets with alter-native data splitting schemes, features, and model performance metrics. In this paper we present a standardized dataset of mammal infecting and non-infecting viral pathogens, refined from the previous work of Mollentze et al. to include the latest literature evidence, roughly doubling the number of curated host-virus records available to the community, and new host target labels, primate and mammal. The new host labels were included for several reasons, including previous reports that classification performance is better at broader taxonomic ranks and the idea that there may be more data for primate infection that might serve as a suitable proxy for zoonotic potential and avoidance of false positives for human infection due to absence of evidence. On this dataset, we report the performance of eight machine learning models for predicting mammal-infecting viruses from their genomic sequences. We find that randomly assigning cases in our improved dataset to training/testing sets, when compared to the original assignments into training/testing in Mollentze et al., increases the overall average ROC AUC of prediction of human infection from 0.663 {+/-} 0.070 to 0.784 {+/-} 0.013, consistent with the reduction in phylogenetic distance between train and test sets (relative entropy change from 3.00 to 0.08). The broadest host category of mammal infection can be predicted most reliably at 0.850 {+/-} 0.020. We share our improved dataset and code to enable standardized comparisons of machine learning methods to predict human host infections. Overall, we have presented preliminary evidence that classification of virus host infection is more tractable at higher taxonomic ranks, that unsurprisingly reducing the phylogenetic distance between training and test sets can improve predictive performance, that peptide kmer features appear to be harmful to out of sample model performance, and we are left with the question of whether models for virus host prediction can reasonably be expected to perform well in out of sample scenarios given the likelihood that viruses do not share a common ancestor. Consistent with this concern, when the data is resampled such that there is no overlap between viral families in training and test sets (relative entropy > 24), models perform no bet-ter than random chance at prediction of human infection regardless of whether kmers are included (ROC AUC 0.50 {+/-} 0.08) or not (ROC AUC 0.50 {+/-} 0.04). Author SummaryDetermining whether a virus can infect a human or other animal based on its genetic information is useful for assessing the threat level of circulating and newly emerging viruses. Previous studies in this domain have had access to limited datasets, and in this work we nearly double the amount of manually labelled host data for viral infection, so that others may build on it and improve it further. We use machine learning models to rank the likelihood of human and mammal infection for viruses in this improved dataset. Results are consistent with the determination of host infection being more tractable for broader categories of hosts, like mammals, than for specific species, like humans. This may suggest that the prospects are good for improved future models that first screen viruses based on their likelihood of infecting mammals, and then in a second stage for likelihood of human infection. The most challenging scenarios were for predictions of viruses that were not similar to viruses in the training data, and the question remains whether we can expect reasonable generalization of predictive models to completely new viruses given that, at the time of writing, viruses do not appear to share a common ancestor.

11
Benchmarking generative scaffold design methods for peptide engineering in TCR-MHC complexes

Xie, L.; Dam, G.-B.; Patel, Y.; Denzler, L.; Shao, Y.; Wang, R.; Caron, E.; Yasumizu, Y.; Hafler, D. A.; Rodriguez Martinez, M.

2026-01-23 bioinformatics 10.64898/2026.01.22.701133 medRxiv
Top 0.1%
2.1%
Show abstract

De novo peptide design at T cell receptor-peptide-major histocompatibility complex (TCR-pMHC) interfaces is a central challenge in computational immunology, with direct implications for vaccine development, cancer immunotherapy, and autoimmune disease. Despite rapid advances in generative protein modeling, there is currently no systematic benchmark evaluating these methods in the highly constrained and immunologically relevant setting of peptide-MHC presentation and TCR recognition. Here, we present two complementary contributions. First, we introduce a multi-stage computational pipeline for peptide design in predefined TCR-pMHC contexts, integrating generative modeling with sequence optimization and structure-based filtering. Second, we establish a benchmark for evaluating generative peptide design methods in TCR-pMHC complexes. Using a curated dataset of high-quality crystal structures deposited after the AlphaFold3 training cutoff, we assess state-of-the-art generative approaches for peptide backbone generation, sequence design, and the enrichment of near-native solutions. We explicitly examine whether different backbone generation strategies respect the geometric constraints of the MHC binding groove and recover native-like peptide conformations. Our results reveal substantial method-dependent differences: some generative strategies fail systematically in the groove-bound peptide setting, whereas others generate physically plausible backbones with varying accuracy and conformational diversity. We further show that enforcing anchor constraints strongly influences peptide conformations at non-anchor positions, highlighting a trade-off between structural accuracy and conformational sampling. To enable fair and reproducible comparison, we introduce a standardized, multi-stage scoring protocol that integrates MHC binding prediction, physics-based energy evaluation, and independent structure prediction confidence metrics to enrich near-native designs from large candidate pools. Together, this work establishes the first comprehensive pipeline and benchmark for generative peptide design at TCR-pMHC interfaces and provides practical guidelines for developing peptide design workflows and evaluating generative models in immunologically constrained protein design settings.

12
FluNexus: a versatile web platform for antigenic prediction and visualization of influenza A viruses

Li, X.; Zhou, C.; Wu, H.; Xiao, K.; Hao, J.; Zhao, D.; Zhu, J.; Li, Y.; Peng, J.; Gu, J.; Deng, G.; Cai, W.; Li, M.; Liu, Y.; Shang, X.; Chen, H.; Kong, H.

2026-01-30 bioinformatics 10.64898/2026.01.29.702696 medRxiv
Top 0.1%
2.0%
Show abstract

Influenza A viruses continuously undergo antigenic evolution to escape host immunity induced by previous infections or vaccinations, consequently causing seasonal epidemics and occasional pandemics. Antigenic prediction and visualization of influenza A viruses are crucial for precise vaccine strain selection and robust pandemic preparedness. However, a user-friendly online platform for these capabilities remains notably absent, despite widespread demand. Here, we present FluNexus (https://flunexus.com), the first-of-its-kind, one-stop-shop web platform designed to facilitate the prediction and visualization of the antigenic change in emerging variants. FluNexus features a data preprocessing module for hemagglutinin subunit 1 (HA1) and hemagglutination inhibition (HI) data across three major public health threat subtypes (H1, H3 and H5). Meanwhile, FluNexus provides an interactive interface for online antigenic prediction and offers practical guidance for researchers. Most notably, FluNexus offers the visualization of influenza A virus antigenic evolution, providing intuitive insights into its antigenic dynamics. Specially, FluNexus proposes a novel manifold-based method for positioning antigens and antisera, generating accurate antigenic cartographies even with sparse HI data. By alleviating the programming burden on biologists, FluNexus supports more informed decision-making in vaccine strain selection and strengthens surveillance and pandemic preparedness. HighlightsO_LIFluNexus features a data preprocessing module for HA1 and HI data spanning the H1, H3, and H5 subtypes. C_LIO_LIFluNexus facilitates online antigenic prediction utilizing ten state-of-the-art antigenic prediction tools, and offers practical guidance based on a comparative evaluation of their performance. C_LIO_LIFluNexus provides a visualization module for mapping antigenic evolution of influenza A viruses, incorporating a novel manifold-based method for antigenic cartography. C_LI

13
SpeciefAI: Multi-species mRNA-level Antibody Framework Generation using Transformers

Grabarczyk, D.; Kocikowski, M.; Parys, M.; Cohen, S. B.; Alfaro, J. A.

2026-03-18 bioinformatics 10.64898/2026.03.16.712018 medRxiv
Top 0.1%
1.8%
Show abstract

MotivationEncoding antibodies (Abs) and nanobodies (Nbs) as mRNA enables in vivo production of therapeutic proteins. However, this approach requires meeting two species-dependent requirements: the mRNA encoding must support efficient expression in the host species, and the encoded protein sequence must resemble the natural Ab repertoire of the recipient species to minimize immunogenicity. These requirements motivate species-conditioned generative models for joint mRNA and protein design. ResultsWe propose SpeciefAI a transformer-based model for multi-species Ab and Nb species sequence-harmonisation by generation of novel Framework Regions (FRs) tailored to input Complementarity-Determining Regions (CDRs). Our model works directly in the mRNA space and learns the correspondence between FRs and CDRs in six species. The model is capable of generating sequences with a highly similar distribution to natural sequences and a mean absolute difference in codon adaptation index (CAI) of 0.013 and 0.033 for humans and dogs respectively. We show that the generated human sequences are highly human (0.95 T20 score) and canine sequences highly canine (0.95 cT20 score). We furthermore demonstrate that we can generate diverse candidate sequences using our method. Availability and ImplementationSource code is available on https://github.com/Dominko/SpeciefAI. OAS and COGNANO data are publicly available on https://opig.stats.ox.ac.uk/webapps/oas/ and https://cognanous.com/datasets/vhh-corpus (preprocessed versions available upon request). Canine data is available on https://zenodo.org/records/18301526.

14
GRIMM-II: A Two-Stage Real-Time Algorithm for Nine-Locus HLA Imputation and Matching with Up to Three Mismatches

Kirshenboim, O.; Kabya, A.; Yehezkel-Imra, R.; Tshuva, Y.; Maiers, M.; Gragert, L.; Bashyal, P.; Israeli, S.; Louzoun, Y.

2026-03-31 bioinformatics 10.64898/2026.03.28.715027 medRxiv
Top 0.1%
1.6%
Show abstract

BackgroundThe success of hematopoietic stem cell transplantation (HSCT) depends critically on human leukocyte antigen (HLA) matching between donor and recipient. While traditional matching focuses on five classical HLA loci (A, B, C, DRB1, DQB1), clinical practice increasingly considers extended typing at nine loci, including DPA1, DQA1, DPB1, and DRB3/4/5. Furthermore, emerging evidence supports transplantation with up to three HLA mismatches under post-transplant cyclophosphamide (PTCy) regimens. However, current donor search algorithms cannot efficiently identify donors with multiple mismatches across extended HLA loci in real-time. MethodsWe developed GRIMM-II (GRaph IMputation and Matching, version II), which comprises two novel algorithms: ML-GRIM (Multi-Locus GRIM) for HLA imputation across multiple loci, and ML-GRMA (Multi-Locus GRMA) for real-time donor-patient matching with up to three mismatches. Both algorithms employ a two-stage approach that combines efficient candidate reduction through graph-theoretic frameworks with detailed genotype comparison. ML-GRIM partitions genotypes into class I (HLA-A, B, C) and class II (remaining loci) components, enabling memory-efficient storage and rapid candidate identification. ML-GRMA searches a pre-imputed donor graph composed of donor genotypes and their sub-components, then computes asymmetric graft-versus-host (GvH) and host-versus-graft (HvG) mismatch probabilities to provide clinically relevant compatibility assessments. Both imputation and matching tools are available as a web application at https://grimmard.math.biu.ac.il/ and through GitHub repositories at https://github.com/nmdp-bioinformatics/py-graph-imputation (imputation) and https://github.com/nmdp-bioinformatics/py-graph-match (matching). ResultsWe validated ML-GRMA and ML-GRIM using the WMDA3 (World Marrow Donor Association) validation dataset, successfully reproducing all previously reported matches while identifying numerous additional candidate donors not detected by previous algorithms. Further validation of ML-GRMA using 3,000 patients with artificially introduced mismatches (0-3 allele substitutions) demonstrated 100% sensitivity and specificity in identifying matching donors at expected mismatch levels. We validated ML-GRIM using simulated nine-locus typings derived from 8,078,224 US donors in the NMDP registry. The algorithm successfully imputed genotypes across variable numbers of typed loci while incorporating multiethnic haplotype frequencies. The algorithm achieved real-time performance with typical imputation times under one second and matching times of 1-13 seconds per patient for up to three mismatches, even when searching databases exceeding 8 million donors. Notably, ML-GRMA identified substantially more potentially suitable donors than traditional algorithms by accounting for the biological reality that GvH and HvG mismatches often differ, particularly for donors homozygous at specific loci. To evaluate ML-GRIM performance with low-resolution typing, we tested it on simulated 3-locus typings from the same population. The resulting imputation accuracy correlated with the mutual information between typed loci and complete genotypes. ConclusionsGRIMM-II provides a scalable, memory-efficient solution for nine-locus HLA imputation and real-time identification of donors with up to three mismatches. The graph-based framework supports dynamic registry updates and can readily accommodate additional HLA loci and matching criteria as clinical knowledge evolves. By expanding the pool of acceptable donors while maintaining computational efficiency, GRIMM-II addresses a critical need in contemporary transplantation practice, particularly for patients from underrepresented ethnic minorities who face lower probabilities of finding perfectly matched donors.

15
immunoPETE: A DNA-based integrated B-cell and T-cell receptor profiling platform

Zhao, H.; Mirebrahim, H.; Telman, D.; Dannebaum, R.; McNamara, S.; Tabari, E.; Lin, H.; Rubelt, F.; Berka, J.; Luong, K.; Joseph, M.; Bryan, R.; Ward, D.; Hayday, A.; Utiramerur, S.; Kumar, D.; Asgharian, H.

2026-03-20 immunology 10.64898/2026.03.17.712532 medRxiv
Top 0.1%
1.5%
Show abstract

The vast diversity of B and T cell receptors generated through the recombination of Variable (V), Diversity (D), and Joining (J) gene segments plays a critical role in adaptive immunity. Profiling immune repertoires at the DNA level provides a robust and stable approach to capture the clonal composition of these receptors. immunoPETE is an assay designed to target recombined human T-cell Receptor Beta (TRB), T-cell Receptor Delta (TRD), and Immunoglobulin Heavy (IGH) chain genes directly from genomic DNA. Simultaneous profiling of B and T cell receptor chains in a single reaction provides internally normalized clone counts and facilitates the study of B-T cell interactions. Full-length amplicon consensus sequences representative of original template DNA molecules are accurately reconstructed using Unique Molecular Identifiers (UMIs). An in-house pipeline compiles VDJ rearrangements from the Complementarity-Determining Region 3 (CDR3) of TRB, TRD and IGH chains into comprehensive readouts at cell-level resolution. In this study, we describe the immunoPETE end-to-end workflow, followed by a comprehensive benchmarking of its performance in adaptive immune profiling. Where applicable, we used both natural and contrived samples and characterized the assays accuracy, linearity, and reproducibility across several metrics: retrieving CDR3 sequences, determining B and T cell ratios, total cell count, yield, fraction of functional rearrangements, clonal diversity, composition of dominant clones, pairwise similarity, and V/J gene usage frequencies. Furthermore, we assessed its quantitative limits concerning the total number of lymphocytes and the detection of rare clones. As an example of its applications, we show that adding immune biomarkers extracted from immunoPETE data to clinical factors improves prediction of progression-free survival in a cohort of non-muscle invasive bladder cancer (NMIBC) patients. Finally, we discuss the broad applications of immunoPETE in the study of aging, cancers, infections, and autoimmune disorders with reference to select published studies.

16
Effects of protein interface mutations on protein quality and affinity

de Kanter, J. K.; Smorodina, E.; Minnegalieva, A.; Arts, M.; Blaabjerg, L. M.; Frolenkova, M.; Rawat, P.; Wolfram, L.; Britze, H.; Wilke, Y.; Weissenborn, L.; Lindenburg, L.; Engelhart, E.; McGowan, K. L.; Emerson, R.; Lopez, R.; van Bemmel, J. G.; Demharter, S.; Spreafico, R.; Greiff, V.

2026-03-26 molecular biology 10.64898/2026.03.24.713863 medRxiv
Top 0.1%
1.3%
Show abstract

Accurately modeling antibody-antigen interactions requires distinguishing intrinsic binding affinity ("protein-interaction") from protein biophysical properties ("protein-quality"), including folding, stability, and expression. However, high-throughput mutational measurements commonly used to train and benchmark computational models often conflate these effects, obscuring the true determinants of molecular recognition. Here, we present an experimental and analytical framework to disentangle protein-interaction effects from protein-quality effects in single-domain antibody (VHH)-antigen binding. Using a large-scale deep mutational scanning (DMS) dataset spanning four VHH-antigen complexes, with single and double mutations in both partners, we introduce control binders to quantify protein-quality changes independently of protein-interaction. This enables decomposition of experimentally measured affinity into protein-interaction and protein-quality components at scale. Leveraging the disentangled dataset, we evaluated state-of-the-art structure- and sequence-based models for protein-quality and protein-interaction prediction and show that their performance largely reflects protein-quality rather than protein-interaction effects. Our results highlight a major confounder in current datasets and suggest that accounting for protein-quality will be essential for training next-generation affinity-prediction models. Nomenclature Antibody related termsO_LIPrimary VHH: The VHH of a VHH-antigen complex for which the paratope and the epitope weremutated. C_LIO_LIControl VHH: A second VHH that binds to the same antigen as the primary VHH but has non-overlapping epitope positions and therefore does not bind to any of the mutated antigen positions. C_LI Affinity-related termsO_LIReal Affinity: "The strength of the interaction between two [...] molecules that bind reversibly (interact)" 1. In the context of antibody-antigen binding, it quantifies interactions between active proteins (which are expressed and correctly folded 2 and are therefore functionally and biologically active (see below). It is commonly quantified by the equilibrium dissociation constant, KD. C_LIO_LIObserved affinity ({degrees}KD): The interaction strength experimentally measured between two molecules. Unlike real affinity, this value is confounded by the biophysical properties of the individual binding partners, specifically their folding, stability, and expression levels. Consequently, the observed affinity often differs from the real/intrinsic affinity if a significant fraction of the protein population is inactive 3. NOTE: Unless otherwise specified, {degrees}KD is reported in - log10 space. For example, a {degrees}KD of -9 corresponds to 10-9M or 1nM. C_LIO_LIChange in observed affinity ({Delta}{degrees}KD): The shift in the observed affinity between two proteins upon mutation, reported as the log10-transformed fold change. A value of 1 reflects a 10-fold difference, a value of 2 a 100-fold difference, etc. This aggregate change resolves into two distinct biophysical components 2, 4: O_LIProtein-interaction change: The change in the intrinsic thermodynamic affinity between the two binding partners, each in its active state (i.e., the specific change in interface Gibbs free energy because both enthalpy and entropy are considered). C_LIO_LIProtein-quality change: The change in the fraction of the mutated protein population that is biologically active - meaning it is expressed, correctly folded, and stable 2, 5. O_LIFolding: The process that guides the polypeptide chain toward its native conformation, which is a prerequisite for forming a functional binding site. C_LIO_LIStability: The thermodynamic capacity to maintain the folded structure over time and under physiological conditions. Stability (decrease in Gibbs free energy from the unfolded to the folded state) ensures the binding interface remains intact and prevents competing processes such as aggregation 6. C_LIO_LIExpression: The steady-state abundance of the protein. This is largely dependent on proper folding and stability, as cellular quality control mechanisms degrade proteins that fail to fold or remain stable at functional concentrations. C_LI C_LI C_LIO_LIChange in relative affinity ({Delta}{Delta}{degrees}KD): the difference between the {Delta}{degrees}KD of the primary VHH compared to the control VHH for a given epitope mutation. C_LI Model-related termsO_LIESM-IF1 sc: Single-chain (sc) structure-conditioned inverse folding model (ESM-IF1), using the isolated monomer structure of the mutated protein: either the VHH or the antigen 7. C_LIO_LIESM-IF1 mc: Multi-chain (mc) structure-conditioned model (ESM-IF1), using the full complex structure (both antibody and antigen) 7. C_LIO_LIStability prediction score: Score that represents the predicted change in stability based on a single mutation, normally represented as {Delta}{Delta}G. C_LI

17
Comprehensive characterization of V(D)J recombination from long-read transcriptomic data with VDJcraft

Hu, K.; Rosenberg, A. F.; Song, Y.; Fan, C.-H.; Peng, Z.; Gao, M.; Chong, Z.

2026-04-05 bioinformatics 10.64898/2026.04.01.715879 medRxiv
Top 0.1%
1.2%
Show abstract

V(D)J recombination generates antigen receptor diversity in developing B and T cells. Long-read transcriptome technologies (e.g., PacBio Iso-Seq, Nanopore RNA/cDNA) capture full-length transcripts and thus resolve V(D)J events more accurately than short-read platforms. However, existing short-read tools are not applicable to or optimized for long-read data. We developed VDJcraft, the first integrated pipeline designed for V(D)J recombination analysis using long-read transcriptome sequencing data. The workflow uses a two-pass alignment strategy: global alignment to the GENCODE reference with minimap2, followed by local realignment and annotation using the international ImMunoGeneTics information system (IMGT). A customized module enhances D-gene detection sensitivity and positional precision. Sequencing errors are reduced through consensus-based correction toward the predominant subclass. Antigen-binding regions are annotated using IMGT-defined motifs to characterize CDRs and binding site composition. VDJcraft was validated on simulated and Human Genome Structural Variation Consortium (HGSVC) datasets and applied to disease datasets. It accurately recovered full-length V(D)J-C sequences and outperformed existing methods in gene detection and recombination accuracy. Long-read calls also showed significantly higher concordance with high-confidence short-read calls (Mann-Whitney U test, p = 1.55 x 10-4). Additionally, we identified 31 putative novel gene subclasses absent from the IMGT database from HGSVC datasets. Analyses of longitudinal blood samples from a COVID-19 patient revealed distinct V(D)J recombination patterns and segment enrichment, characterized by increased IGHV1-2 usage, enrichment of the IGHV3-7/IGHD6-9/IGHJ5_02 rearranged clonotype, and a transient peak in IgG2 levels at day 4 followed by a gradual return to baseline. In conclusion, VDJcraft provides a robust framework for long-read V(D)J characterization and enables the discovery of disease-associated immune signatures.

18
Combining xenium in situ spatial transcriptomics and imaging mass cytometry on a single tissue section

Allen, R.; Duchini, E.; Ameen, F.; Ashhurst, T. M.; Ireland, R.; Conway, J.; Bai, X.; Hong, A.; Ferguson, A. L.; Patrick, E.; Palendira, U.

2026-02-19 immunology 10.64898/2026.02.18.700929 medRxiv
Top 0.2%
1.0%
Show abstract

Spatial imaging technologies provide an expansive view of tissue microenvironments through high-plex profiling of protein and molecular targets in situ. Imaging mass cytometry (IMC; Standard BioTools) is a trusted method for defining immune phenotypes based on up to 40 protein targets, whilst Xenium in situ spatial transcriptomics (Xenium; 10x Genomics) is an emerging platform that can measure up to 5000 mRNA markers simultaneously. Although these platforms can reveal valuable insights on their own, there is an increasing need to analyse samples using a multi-omics approach to further our understanding of complex biological processes. To address this, we have assessed a novel dual-platform workflow that combines Xenium and IMC on a single formalin-fixed paraffin-embedded tissue section to enable the spatial profiling of both mRNA and protein targets at single-cell resolution. The feasibility of the workflow was determined by comparing the staining quality of IMC performed after Xenium to that of IMC performed alone on an adjacent tissue section, confirming that Xenium has little to no negative impact on subsequent IMC protein staining. Although the location of transcripts picked up by Xenium correlated with the corresponding proteins picked up by IMC at a global scale, discrepancies between the two technologies were apparent at the single-cell level. This is to be expected, as biologically transcript expression does not always correlate with protein, and both platforms have their own technical limitations. However, when we analyse T cells identified by both technologies, as opposed to T cells identified by Xenium or IMC alone, it produces the most biologically meaningful results at both the transcript and protein level for specific T cell markers. These results highlight how integration of the two platforms, identifying the presence of both RNA and protein, can foster a more comprehensive view of cellular landscapes and provide a greater depth of functional capabilities and cellular interactions.

19
Rational Design of Selective IL-2-based Activators for CAR T Cells Using AlphaFold3 and Physics-Informed Machine Learning

Dahmani, L. Z.; Banerjee, A.

2026-03-12 bioinformatics 10.64898/2026.03.10.710391 medRxiv
Top 0.2%
0.9%
Show abstract

Recombinant human Interleukin-2 (rhIL-2, Aldesleukin) is used in immunotherapy for metastatic melanoma and renal cell carcinoma. Low-dose IL-2 has been investigated for administration after adoptive T cell transfer to enhance CAR T expansion and sustain effector function. However, systemic IL-2 can cause severe toxicities and promote expansion of regulatory T cells (Tregs). Previous attempts at mitigating cytokine-mediated side effects involved isolating CAR T cell signaling from endogenous immune responses by developing IL-2/IL-2R{beta} based selective ligand-receptors systems. Expressing these variant orthogonal (ortho)IL2-R{beta} receptors in CAR T cells and supplying variant orthoIL-2, was shown to dramatically improve selectivity in CAR T cell expansion and anti-tumoral potency in a leukemia mouse model. This study describes the computational design of synthetic orthogonal cytokine receptor-ligand systems based on the scaffolds of the human canonical IL-2 and IL-2R{beta}. Leveraging state-of-the-art AlphaFold3 (AF3) structure prediction capabilities and a physics-informed constrained sequence generator (CSG), the pipeline generates, filters and ranks sets of putative orthoIL-2/orthoIL-2R{beta} mutant designs. Variants displaying minimal predicted off-target interactions and enhanced in target contacts are prioritized for structural modelling. Top designs showed outstanding AF3 structural and interfacial quality metrics ipTM and pTM, with averages between cognate pairs of 0.724{+/-}0.05 and 0.770{+/-}0.042, respectively. All in-silico hits showed ipTM <0.5 for non-cognates, indicating a good likelihood of orthogonality. Additionally, putative hits showed high levels of predicted structural fidelity to wild-type (WT) human IL-2/IL-2R{beta} (PDB: 2ERJ), with an average structural root-mean-square deviation (RMSD) of 0.843{+/-}0.375 [A]. These mutants incorporated 7-26 interfacial mutations derived from multiple interface selection strategies. Altogether, the results support the putative foldability and selective affinity of top-ranking mutants displaying metrics close-to or within experimental reference range. Finally, strengths and limitations are discussed, alongside the experimental implications of coupling a constrained protein design pipeline to the discovery and validation of selective binders based on naturally occurring scaffolds.

20
PMGen: From Peptide-MHC Structure Prediction to Peptide Generation

Asgary, A. H.; Aleyasin, A.; Mehl, J. A.; Fallah, S.; Aintablian, H.; Ludewig, B.; Mishto, M.; Liepe, J.; Soeding, J.

2026-02-25 bioinformatics 10.1101/2025.11.14.688404 medRxiv
Top 0.2%
0.9%
Show abstract

Accurate structural modeling of peptide-MHC (pMHC) complexes is a prerequisite for understanding adaptive immunity and developing data-driven immunotherapies. However, current tools are often limited by narrow class coverage, restricted peptide lengths, or insufficient accuracy for downstream design tasks. Here, we introduce PMGen (Peptide MHC Generator), an integrated framework for structure prediction and structure-guided design of variable-length peptides across MHC class I and II. By introducing Initial Guess and Template Engineering as strategies to enforce anchor constraints in AlphaFold2, PMGen achieves state-of-the-art structural fidelity with median peptide core RMSDs of 0.54 [A] for MHC-I and 0.33 [A] for MHC-II, outperforming five state-of-the-art methods. We further demonstrate that PMGen captures the subtle structural impact of single-point neoantigen mutations and that model confidence (pLDDT) reliably correlates with structural accuracy. We investigated two potential applications of our framework: structure-aware peptide design and generating data for machine learning (ML) models. To this end, we introduced a framework to sample peptides with preserved structures and improved binding affinity. As an example for ML application, we fine-tuned ProteinMPNN on PMGen-modeled structures. This improved sequence recovery from 0.19 to 0.40 compared to the baseline. Ultimately, PMGen bridges the gap between high-fidelity structural prediction and downstream sequence design, offering a scalable solution to generate the large-scale, high-quality structural datasets required to train advanced predictive models in immunology. Available at https://github.com/soedinglab/PMGen.