LM-QASAS: Reference-free identification of antigen-specific sequences from the BCR repertoire using antibody language models
Masuda, G.; Funakoshi, Y.; Iizumi, S.; Yakushijin, K.; Ohji, G.; Minami, H.; Ohue, M.
Show abstract
The B-cell receptor (BCR) repertoire serves as a historical record of immunological events. However, deciphering antigen-specific sequences from this vast dataset remains a challenge, particularly for novel pathogens where prior knowledge is absent. While time-course analysis methods such as QASAS have proven effective for tracking immune responses, they rely on existing antibody databases, limiting their applicability to emerging diseases. To overcome this limitation, we introduce LM-QASAS, a reference-free computational framework that integrates antibody language models with repertoire dynamics. By mapping sequences into a high-dimensional semantic embedding space, LM-QASAS identifies functionally convergent clusters of sequences that are semantically similar and exhibit transient expansion upon immune stimulation. In healthy individuals vaccinated with SARS-CoV-2 mRNA vaccines, our method identified spike-specific sequences with over 90\% purity, significantly outperforming methods based on simple sequence identity or abundance. Leave-one-out cross-validation demonstrated that LM-QASAS could accurately reconstruct immune dynamics in unseen individuals without external references. Conversely, the method showed limited sensitivity in an influenza vaccine cohort, revealing that the approach is most effective under conditions of robust clonal expansion (high signal-to-noise ratio), such as those induced by mRNA vaccines. LM-QASAS provides a rapid, high-precision platform for monitoring humoral immunity against emerging threats.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.