Back

Mechanistic Interpretability for Protein Language Models: A Validation Framework

Chon, P.; ANDREOPOULOS, W. B.

2026-06-02 bioinformatics
10.64898/2026.05.29.727021 bioRxiv
Show abstract

Protein language models (PLMs) are shown to be powerful predictors of protein structure and function but their internal mechanisms remain poorly understood. Recent mechanistic interpretability methods have decomposed PLM representations into interpretable features, but they have not combined methods on a single biologically meaningful task. This paper tests whether an InterPLM sparse autoencoder and ProtoMech cross-layer transcoder can discover features in ESM-2 (6 layers, 8M) that can mainly discriminate between Class A {beta}-lactamase and Class B {beta}-lactamase with class C and D used as more challenging comparisons. The main goal is to find distinct features for Class A {beta}-lactamase that are not shared by other classes. We find that both methods find distinct features for Class A {beta}-lactamase, but the cross-layer transcoders show that the concepts for Class A {beta}-lactamase seems to be distributed among nodes such as in layer 4 and 6 rather than one node. We also showcase a validation framework to prevent overclaiming the role of a node, and we use it to show that several strong nodes fail in some stages of the framework meaning that they cannot be the sole node that defines Class A {beta}-lactamase.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
19.2%
2
BMC Bioinformatics
383 papers in training set
Top 0.7%
12.2%
3
PLOS Computational Biology
1633 papers in training set
Top 2%
12.2%
4
Briefings in Bioinformatics
326 papers in training set
Top 0.9%
6.3%
5
Bioinformatics Advances
184 papers in training set
Top 0.9%
4.2%
50% of probability mass above
6
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
3.9%
7
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.2%
3.5%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.7%
9
Scientific Reports
3102 papers in training set
Top 45%
2.6%
10
Journal of Molecular Biology
217 papers in training set
Top 1%
2.3%
11
Journal of Cheminformatics
25 papers in training set
Top 0.3%
1.7%
12
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
14
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.3%
15
Computational Biology and Chemistry
23 papers in training set
Top 0.2%
1.3%
16
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.3%
1.3%
17
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.2%
18
PLOS ONE
4510 papers in training set
Top 60%
1.2%
19
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
20
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
21
Physical Biology
43 papers in training set
Top 2%
0.7%
22
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%
23
Nature Communications
4913 papers in training set
Top 66%
0.6%
24
Cell Systems
167 papers in training set
Top 14%
0.6%