Comprehensive characterization of V(D)J recombination from long-read transcriptomic data with VDJcraft
Hu, K.; Rosenberg, A. F.; Song, Y.; Fan, C.-H.; Peng, Z.; Gao, M.; Chong, Z.
Show abstract
V(D)J recombination generates antigen receptor diversity in developing B and T cells. Long-read transcriptome technologies (e.g., PacBio Iso-Seq, Nanopore RNA/cDNA) capture full-length transcripts and thus resolve V(D)J events more accurately than short-read platforms. However, existing short-read tools are not applicable to or optimized for long-read data. We developed VDJcraft, the first integrated pipeline designed for V(D)J recombination analysis using long-read transcriptome sequencing data. The workflow uses a two-pass alignment strategy: global alignment to the GENCODE reference with minimap2, followed by local realignment and annotation using the international ImMunoGeneTics information system (IMGT). A customized module enhances D-gene detection sensitivity and positional precision. Sequencing errors are reduced through consensus-based correction toward the predominant subclass. Antigen-binding regions are annotated using IMGT-defined motifs to characterize CDRs and binding site composition. VDJcraft was validated on simulated and Human Genome Structural Variation Consortium (HGSVC) datasets and applied to disease datasets. It accurately recovered full-length V(D)J-C sequences and outperformed existing methods in gene detection and recombination accuracy. Long-read calls also showed significantly higher concordance with high-confidence short-read calls (Mann-Whitney U test, p = 1.55 x 10-4). Additionally, we identified 31 putative novel gene subclasses absent from the IMGT database from HGSVC datasets. Analyses of longitudinal blood samples from a COVID-19 patient revealed distinct V(D)J recombination patterns and segment enrichment, characterized by increased IGHV1-2 usage, enrichment of the IGHV3-7/IGHD6-9/IGHJ5_02 rearranged clonotype, and a transient peak in IgG2 levels at day 4 followed by a gradual return to baseline. In conclusion, VDJcraft provides a robust framework for long-read V(D)J characterization and enables the discovery of disease-associated immune signatures.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.