Back

LM-QASAS: Reference-free identification of antigen-specific sequences from the BCR repertoire using antibody language models

Masuda, G.; Funakoshi, Y.; Iizumi, S.; Yakushijin, K.; Ohji, G.; Minami, H.; Ohue, M.

2026-04-01 allergy and immunology
10.64898/2026.03.31.26349834 medRxiv
Show abstract

The B-cell receptor (BCR) repertoire serves as a historical record of immunological events. However, deciphering antigen-specific sequences from this vast dataset remains a challenge, particularly for novel pathogens where prior knowledge is absent. While time-course analysis methods such as QASAS have proven effective for tracking immune responses, they rely on existing antibody databases, limiting their applicability to emerging diseases. To overcome this limitation, we introduce LM-QASAS, a reference-free computational framework that integrates antibody language models with repertoire dynamics. By mapping sequences into a high-dimensional semantic embedding space, LM-QASAS identifies functionally convergent clusters of sequences that are semantically similar and exhibit transient expansion upon immune stimulation. In healthy individuals vaccinated with SARS-CoV-2 mRNA vaccines, our method identified spike-specific sequences with over 90\% purity, significantly outperforming methods based on simple sequence identity or abundance. Leave-one-out cross-validation demonstrated that LM-QASAS could accurately reconstruct immune dynamics in unseen individuals without external references. Conversely, the method showed limited sensitivity in an influenza vaccine cohort, revealing that the approach is most effective under conditions of robust clonal expansion (high signal-to-noise ratio), such as those induced by mRNA vaccines. LM-QASAS provides a rapid, high-precision platform for monitoring humoral immunity against emerging threats.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 13%
12.6%
2
Cell Reports Methods
141 papers in training set
Top 0.2%
8.3%
3
Cell Genomics
162 papers in training set
Top 0.5%
6.3%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.3%
5
Cell Reports
1338 papers in training set
Top 10%
4.8%
6
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.3%
7
Frontiers in Immunology
586 papers in training set
Top 2%
4.1%
8
Science Advances
1098 papers in training set
Top 6%
3.6%
50% of probability mass above
9
eLife
5422 papers in training set
Top 26%
3.6%
10
Cell Systems
167 papers in training set
Top 4%
3.0%
11
Communications Biology
886 papers in training set
Top 3%
2.9%
12
iScience
1063 papers in training set
Top 7%
2.7%
13
Nature Biotechnology
147 papers in training set
Top 3%
2.4%
14
Advanced Science
249 papers in training set
Top 9%
2.1%
15
Cell Reports Medicine
140 papers in training set
Top 3%
2.1%
16
Nature Methods
336 papers in training set
Top 4%
1.9%
17
Cellular & Molecular Immunology
14 papers in training set
Top 0.8%
1.9%
18
Patterns
70 papers in training set
Top 0.7%
1.9%
19
Cell
370 papers in training set
Top 11%
1.7%
20
Nucleic Acids Research
1128 papers in training set
Top 11%
1.6%
21
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
22
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
23
Nature Immunology
71 papers in training set
Top 2%
0.9%
24
Bioinformatics
1061 papers in training set
Top 9%
0.8%
25
Nature Computational Science
50 papers in training set
Top 2%
0.8%
26
Scientific Reports
3102 papers in training set
Top 75%
0.7%
27
Immunity
58 papers in training set
Top 4%
0.7%
28
Genome Medicine
154 papers in training set
Top 9%
0.6%