Back

Brain2VLM: Hierarchical Alignment Between Cortical Representations and Vision-Language Latent Spaces

Pritam, N. A. A.; O, J. S.; Jain, S.

2026-04-23 neuroscience
10.64898/2026.04.23.720313 bioRxiv
Show abstract

This work introduces Brain2VLM, a framework for analyzing how cortical representations align with latent spaces of pretrained diffusion-based vision-language models for brain-to-image reconstruction. While recent approaches achieve strong performance by mapping functional Magnetic Resonance Imaging (fMRI) signals to model latents, the structure of this mapping remains poorly understood. We hypothesize that brain-to-latent alignment is hierarchical, with early visual cortex exhibiting approximately linear correspondence to structural diffusion latents, and higher-order visual areas requiring nonlinear mappings to align with semantic embedding spaces. To test this, we decode diffusion latents and CLIP embeddings from fMRI signals using both linear ridge regression and a nonlinear residual MLP on the Natural Scenes Dataset. Our results reveal that nonlinear decoding provides only marginal improvements for diffusion latents ({approx}{Delta} 0.05 - 0.06 in correlation), but yields substantial gains for semantic embeddings ({Delta}{approx}0.47), significantly improving distributional alignment (MMD: 0.042 vs 0.358). However, increased decoder expressivity can introduce shifts in latent distributions, highlighting a trade-off between prediction accuracy and generative compatibility. Despite using a simple reconstruction pipeline, Brain2VLM achieves strong performance (PixCorr 0.33, CLIP 85%), suggesting that improvements in brain-to-latent alignment play an important role in reconstruction quality alongside generative modeling. These findings provide empirical evidence for hierarchical alignment between cortical representations and model latent spaces, positioning the brain-to-latent interface as a primary bottleneck in brain decoding systems. Our code can be found at https://github.com/adarsh-crafts/Brain2VLM

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Imaging Neuroscience
242 papers in training set
Top 0.1%
19.0%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.6%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 7%
9.3%
4
Medical Image Analysis
33 papers in training set
Top 0.1%
8.5%
5
Nature Communications
4913 papers in training set
Top 25%
7.3%
50% of probability mass above
6
Human Brain Mapping
295 papers in training set
Top 1.0%
6.5%
7
NeuroImage
813 papers in training set
Top 2%
4.9%
8
Nature Neuroscience
216 papers in training set
Top 2%
4.0%
9
Nature Methods
336 papers in training set
Top 3%
4.0%
10
Cell Reports
1338 papers in training set
Top 22%
1.9%
11
Nature
575 papers in training set
Top 11%
1.7%
12
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.3%
1.7%
13
Journal of Neural Engineering
197 papers in training set
Top 1%
1.7%
14
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
15
Communications Biology
886 papers in training set
Top 12%
1.3%
16
Neuron
282 papers in training set
Top 6%
1.3%
17
Scientific Reports
3102 papers in training set
Top 65%
1.2%
18
eLife
5422 papers in training set
Top 51%
1.0%
19
PLOS ONE
4510 papers in training set
Top 67%
0.8%
20
Nature Computational Science
50 papers in training set
Top 2%
0.8%
21
Network Neuroscience
116 papers in training set
Top 1%
0.7%
22
Science Advances
1098 papers in training set
Top 33%
0.7%
23
Nature Biotechnology
147 papers in training set
Top 9%
0.5%