Back

Unsupervised Machine Learning for Adaptive Immune Receptors with immuneML

Pavlovic, M.; Wurtzen, C.; Kanduri, C.; Mamica, M.; Scheffer, L.; Lund-Andersen, C.; Gubatan, J. M.; Ullmann, T.; Greiff, V.; Sandve, G. K.

2026-04-18 bioinformatics
10.64898/2026.04.15.718648 bioRxiv
Show abstract

Machine learning (ML) enables adaptive immune receptor repertoires (AIRRs) analyses for biomarker identification and therapeutic development. With the majority of AIRR data partially or imperfectly labeled, unsupervised ML is essential for motif discovery, biologically meaningful clustering, and generation of novel receptor sequences. However, no unified framework for unsupervised ML exists in the AIRR field, hindering the assessment of model robustness and generalizability. Here, we present an immuneML release advancing unsupervised ML in the AIRR field through unified clustering workflows, interpretable generative modeling, integration with protein language model embeddings, dimensionality reduction, and visualization. We demonstrate immuneMLs utility in three use cases: (i) benchmarking generative models for epitope-specific sequence generation, assessing specificity and novelty, (ii) systematic evaluation of clustering approaches on experimental receptor sequences against biological properties, such as epitope specificity and MHC, and (iii) unsupervised analysis of an experimental AIRR dataset to examine potential confounding, a practice widespread in related fields but unexplored in AIRR analyses.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 2%
9.9%
2
Bioinformatics
1061 papers in training set
Top 4%
6.7%
3
Bioinformatics Advances
184 papers in training set
Top 0.5%
6.3%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
5
Patterns
70 papers in training set
Top 0.1%
6.2%
6
Nature Communications
4913 papers in training set
Top 34%
4.8%
7
Frontiers in Immunology
586 papers in training set
Top 2%
4.2%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.1%
9
Genome Medicine
154 papers in training set
Top 2%
3.6%
50% of probability mass above
10
Nature Methods
336 papers in training set
Top 3%
3.5%
11
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
12
Nature Machine Intelligence
61 papers in training set
Top 1%
3.5%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.8%
3.5%
14
ImmunoInformatics
11 papers in training set
Top 0.1%
3.5%
15
Cell Systems
167 papers in training set
Top 5%
2.7%
16
Cell Reports Methods
141 papers in training set
Top 2%
1.9%
17
Scientific Reports
3102 papers in training set
Top 56%
1.8%
18
Advanced Science
249 papers in training set
Top 11%
1.7%
19
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
20
PLOS ONE
4510 papers in training set
Top 57%
1.5%
21
Science Advances
1098 papers in training set
Top 22%
1.3%
22
Genome Biology
555 papers in training set
Top 5%
1.3%
23
Molecular & Cellular Proteomics
158 papers in training set
Top 1%
1.1%
24
GigaScience
172 papers in training set
Top 2%
1.1%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.9%
26
Journal of Proteome Research
215 papers in training set
Top 2%
0.8%
27
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
28
Cell Genomics
162 papers in training set
Top 7%
0.7%
29
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
30
Communications Biology
886 papers in training set
Top 27%
0.7%