Back

Reproducible Zero-Shot Decoding of Conceptual Knowledge from Human fMRI: A Systematic Evaluation of the Semantic Output Code Framework

Rahman, M. R.

2026-06-01 neuroscience
10.64898/2026.05.27.728259 bioRxiv
Show abstract

Zero-shot learning from functional magnetic resonance imaging (fMRI) data offers a principled approach to decoding conceptual knowledge without requiring training examples for every target concept. The Semantic Output Code (SOC) framework, introduced by Palatucci et al. [2009], operationalises this idea through a two-stage pipeline: a regression-based mapping from voxel activations to a semantic feature space (the S map), followed by nearest-neighbour retrieval over a semantic knowledge base (the L map). Despite its foundational role in the field, no fully documented, open-source replication of this framework has been published on the original Mitchell et al. [2008] fMRI dataset. We present such a replication and extend it through a systematic evaluation of every major design choice in the pipeline. Using the official 25-verb co-occurrence feature space from Mitchell et al. [2008] and the correlation-stability voxel selection criterion, our pipeline achieves a mean pairwise 2-way forced-choice accuracy of 76.5% (SD = 4.9%, range: 70.0%-84.1%) across all nine subjects of the Mitchell dataset, within 0.5 percentage points of the published benchmark of 77%. We document and resolve a previously unreported evaluation artefact caused by a degenerate zero-vector knowledge base entry for one stimulus word (skyscraper), which suppressed accuracy by approximately 8 percentage points under the broken configuration. Sensitivity analyses across regularisation strength, voxel count, and knowledge base normalisation demonstrate that the pipeline is robust to hyperparameter choice within a broad operating range, with voxel count being the single most impactful factor. Substantial inter-subject variability is documented, with pairwise accuracy ranging from 70.0% (P9) to 84.1% (P1), a spread of 14.1 percentage points that exceeds the difference between our mean and the Mitchell benchmark. All code, the expanded 60-word knowledge base, and the complete evaluation pipeline are released as open-source software at https://github.com/Rashed525/fmri-zsl-pipeline.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
NeuroImage
813 papers in training set
Top 0.6%
18.4%
2
Imaging Neuroscience
242 papers in training set
Top 0.2%
10.0%
3
Nature Methods
336 papers in training set
Top 1%
10.0%
4
Nature Communications
4913 papers in training set
Top 21%
9.1%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 16%
4.3%
50% of probability mass above
6
eLife
5422 papers in training set
Top 25%
3.6%
7
Human Brain Mapping
295 papers in training set
Top 2%
3.6%
8
Medical Image Analysis
33 papers in training set
Top 0.4%
3.6%
9
The Journal of Neuroscience
928 papers in training set
Top 4%
2.9%
10
Nature Neuroscience
216 papers in training set
Top 3%
2.6%
11
eneuro
389 papers in training set
Top 4%
2.4%
12
Cerebral Cortex
357 papers in training set
Top 0.5%
2.4%
13
Neuron
282 papers in training set
Top 5%
2.1%
14
Communications Biology
886 papers in training set
Top 6%
1.9%
15
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
16
Nature
575 papers in training set
Top 11%
1.8%
17
Scientific Reports
3102 papers in training set
Top 58%
1.7%
18
PLOS ONE
4510 papers in training set
Top 55%
1.6%
19
Nature Computational Science
50 papers in training set
Top 1%
1.2%
20
Aperture Neuro
18 papers in training set
Top 0.3%
1.2%
21
Nature Human Behaviour
85 papers in training set
Top 4%
0.9%
22
Communications Psychology
20 papers in training set
Top 0.3%
0.8%
23
Scientific Data
174 papers in training set
Top 3%
0.6%
24
PLOS Biology
408 papers in training set
Top 23%
0.6%
25
Network Neuroscience
116 papers in training set
Top 1%
0.6%
26
Frontiers in Neuroscience
223 papers in training set
Top 9%
0.6%