Back

Comprehensive characterization of V(D)J recombination from long-read transcriptomic data with VDJcraft

Hu, K.; Rosenberg, A. F.; Song, Y.; Fan, C.-H.; Peng, Z.; Gao, M.; Chong, Z.

2026-04-05 bioinformatics
10.64898/2026.04.01.715879 bioRxiv
Show abstract

V(D)J recombination generates antigen receptor diversity in developing B and T cells. Long-read transcriptome technologies (e.g., PacBio Iso-Seq, Nanopore RNA/cDNA) capture full-length transcripts and thus resolve V(D)J events more accurately than short-read platforms. However, existing short-read tools are not applicable to or optimized for long-read data. We developed VDJcraft, the first integrated pipeline designed for V(D)J recombination analysis using long-read transcriptome sequencing data. The workflow uses a two-pass alignment strategy: global alignment to the GENCODE reference with minimap2, followed by local realignment and annotation using the international ImMunoGeneTics information system (IMGT). A customized module enhances D-gene detection sensitivity and positional precision. Sequencing errors are reduced through consensus-based correction toward the predominant subclass. Antigen-binding regions are annotated using IMGT-defined motifs to characterize CDRs and binding site composition. VDJcraft was validated on simulated and Human Genome Structural Variation Consortium (HGSVC) datasets and applied to disease datasets. It accurately recovered full-length V(D)J-C sequences and outperformed existing methods in gene detection and recombination accuracy. Long-read calls also showed significantly higher concordance with high-confidence short-read calls (Mann-Whitney U test, p = 1.55 x 10-4). Additionally, we identified 31 putative novel gene subclasses absent from the IMGT database from HGSVC datasets. Analyses of longitudinal blood samples from a COVID-19 patient revealed distinct V(D)J recombination patterns and segment enrichment, characterized by increased IGHV1-2 usage, enrichment of the IGHV3-7/IGHD6-9/IGHJ5_02 rearranged clonotype, and a transient peak in IgG2 levels at day 4 followed by a gradual return to baseline. In conclusion, VDJcraft provides a robust framework for long-read V(D)J characterization and enables the discovery of disease-associated immune signatures.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 1%
14.0%
2
Genome Medicine
154 papers in training set
Top 0.3%
12.3%
3
Frontiers in Immunology
586 papers in training set
Top 0.4%
12.3%
4
Nature Communications
4913 papers in training set
Top 31%
6.1%
5
Cell Reports Medicine
140 papers in training set
Top 2%
3.5%
6
Cell Reports Methods
141 papers in training set
Top 1.0%
3.5%
50% of probability mass above
7
Genome Biology
555 papers in training set
Top 3%
3.5%
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
9
Advanced Science
249 papers in training set
Top 7%
3.0%
10
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
11
Bioinformatics
1061 papers in training set
Top 6%
2.5%
12
Cell Genomics
162 papers in training set
Top 3%
1.8%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
14
Cell Systems
167 papers in training set
Top 7%
1.6%
15
iScience
1063 papers in training set
Top 16%
1.6%
16
Science Immunology
81 papers in training set
Top 1%
1.6%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
18
Nature Methods
336 papers in training set
Top 5%
1.2%
19
Communications Biology
886 papers in training set
Top 15%
1.2%
20
mAbs
28 papers in training set
Top 0.2%
1.2%
21
ImmunoInformatics
11 papers in training set
Top 0.1%
1.2%
22
Nature Biotechnology
147 papers in training set
Top 6%
1.1%
23
GigaScience
172 papers in training set
Top 3%
0.9%
24
JCI Insight
241 papers in training set
Top 6%
0.9%
25
PLOS ONE
4510 papers in training set
Top 67%
0.8%
26
Cytometry Part A
30 papers in training set
Top 0.3%
0.8%
27
eLife
5422 papers in training set
Top 59%
0.7%
28
Genome Research
409 papers in training set
Top 4%
0.7%
29
Scientific Reports
3102 papers in training set
Top 76%
0.7%
30
Cell Reports
1338 papers in training set
Top 36%
0.6%