Reproducible Zero-Shot Decoding of Conceptual Knowledge from Human fMRI: A Systematic Evaluation of the Semantic Output Code Framework
Rahman, M. R.
Show abstract
Zero-shot learning from functional magnetic resonance imaging (fMRI) data offers a principled approach to decoding conceptual knowledge without requiring training examples for every target concept. The Semantic Output Code (SOC) framework, introduced by Palatucci et al. [2009], operationalises this idea through a two-stage pipeline: a regression-based mapping from voxel activations to a semantic feature space (the S map), followed by nearest-neighbour retrieval over a semantic knowledge base (the L map). Despite its foundational role in the field, no fully documented, open-source replication of this framework has been published on the original Mitchell et al. [2008] fMRI dataset. We present such a replication and extend it through a systematic evaluation of every major design choice in the pipeline. Using the official 25-verb co-occurrence feature space from Mitchell et al. [2008] and the correlation-stability voxel selection criterion, our pipeline achieves a mean pairwise 2-way forced-choice accuracy of 76.5% (SD = 4.9%, range: 70.0%-84.1%) across all nine subjects of the Mitchell dataset, within 0.5 percentage points of the published benchmark of 77%. We document and resolve a previously unreported evaluation artefact caused by a degenerate zero-vector knowledge base entry for one stimulus word (skyscraper), which suppressed accuracy by approximately 8 percentage points under the broken configuration. Sensitivity analyses across regularisation strength, voxel count, and knowledge base normalisation demonstrate that the pipeline is robust to hyperparameter choice within a broad operating range, with voxel count being the single most impactful factor. Substantial inter-subject variability is documented, with pairwise accuracy ranging from 70.0% (P9) to 84.1% (P1), a spread of 14.1 percentage points that exceeds the difference between our mean and the Mitchell benchmark. All code, the expanded 60-word knowledge base, and the complete evaluation pipeline are released as open-source software at https://github.com/Rashed525/fmri-zsl-pipeline.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.