Back

Interpretability as stability under perturbation reveals systematic inconsistencies in feature attribution

Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.

2026-04-22 health informatics
10.64898/2026.04.20.26351354 medRxiv
Show abstract

Interpreting machine learning models typically relies on feature attribution methods that quantify the contribution of individual variables to model predictions. However, it remains unclear whether attribution magnitude reflects the true functional importance of features for model performance. Here, we present a unified interpretability framework integrating permutation-based attribution, feature ablation, and stability under perturbation across multiple feature spaces. Using nested cross-validation and permutation-based null diagnostics, we systematically evaluate the relationship between attribution magnitude and functional dependence in clinical and biomarker-based prediction models. Attribution magnitude is frequently misaligned with functional importance, with weak to strong negative correlations observed across feature spaces (Spearman {rho} ranging from -0.374 to -0.917). Features with high attribution often have limited impact on model performance when removed, whereas features with low attribution can be essential for maintaining predictive accuracy. These discrepancies define distinct classes of interpretability failure, including attribution excess and latent dependence. Interpretability further depends on feature space composition, and stable, functionally relevant features are not necessarily those with the highest attribution scores. By integrating attribution, functional impact, and stability into a composite Feature Reliability Score, we identify features that remain informative across perturbations and analytical contexts. These findings indicate that interpretability does not arise from attribution magnitude alone but is better characterized from stability under perturbation. This framework provides a basis for more robust model interpretation and highlights limitations of attribution-centric approaches in high-dimensional and correlated data settings.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.7%
19.1%
2
npj Digital Medicine
97 papers in training set
Top 0.6%
7.4%
3
Patterns
70 papers in training set
Top 0.1%
7.0%
4
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
6.5%
5
PLOS ONE
4510 papers in training set
Top 30%
5.0%
6
Physical Biology
43 papers in training set
Top 0.3%
4.3%
7
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
50% of probability mass above
8
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.9%
9
Communications Biology
886 papers in training set
Top 7%
1.8%
10
Nature Communications
4913 papers in training set
Top 49%
1.8%
11
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.4%
1.7%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
14
NeuroImage
813 papers in training set
Top 4%
1.4%
15
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.3%
16
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.3%
17
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
18
Human Brain Mapping
295 papers in training set
Top 3%
1.1%
19
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.1%
20
npj Systems Biology and Applications
99 papers in training set
Top 2%
1.0%
21
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.0%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
23
GigaScience
172 papers in training set
Top 3%
0.8%
24
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
25
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
26
Statistics in Medicine
34 papers in training set
Top 0.3%
0.8%
27
Network Neuroscience
116 papers in training set
Top 1%
0.8%
28
Communications Medicine
85 papers in training set
Top 1.0%
0.8%
29
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
30
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.6%
0.8%