Evaluating Expert Specialization in Mixture-of-Experts Antibody Language Models
Burbach, S. M.; Spandau, S.; Hurtado, J.; Briney, B.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWAntibody language models (AbLMs) show an impressive aptitude for learning antibody features, but tend to struggle learning the highly diverse, non-templated regions of antibodies. Existing AbLMs use dense architectures, where all model parameters attend to each amino acid token. We hypothesized that the modular nature of antibodies could benefit from a sparse mixture-of-experts (MoE) architecture, allowing specific parameters (referred to as experts) to specialize in distinct antibody features. While MoE architectures are widely adopted and optimized in natural language processing domains, they are less common in biological modeling. To this end, we assess existing MoE routing strategies and find that token-choice routing strategies outperform expert-choice routing, presumably due to their specialization in CDRH3 residues. We further optimized the token-choice router for AbLMs, by minimizing the routing of padding tokens to enable pre-training with varying sequence lengths. Finally, we show that a large-scale baseline antibody language model with a Top-2 MoE architecture (BALM-MoE), trained on a mixture of unpaired and paired antibody sequences, outperforms its dense counterpart with the same number of active parameters.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.