Back

Beyond next-word prediction: hierarchical linguistic composition drives LLM-brain alignment in time

Zhao, J.; Brennan, J. R.

2026-05-16 neuroscience
10.64898/2026.05.15.725490 bioRxiv
Show abstract

The internal representations of large language models (LLMs) correlate, or "align", with human neural activity during language comprehension. One view holds that this alignment reflects shared sensitivity to statistical patterns in LLMs and humans, while others hold that it reflects, at least in part, the emergence of shared linguistic representations in these systems. Here, we investigate whether hierarchical linguistic composition, a property believed to be fundamental to human language, modulates LLM-brain alignment. To this end, we manipulated syntax, compositional semantics, and associative semantics in English sentences that were presented to both an LLM and human participants during an electroencephalography (EEG) experiment. We matched linguistically manipulated stimuli in predictability, which allows us to tease apart alignment induced by linguistic structure from statistical factors. By comparing LLM-EEG alignment scores that were derived using a linear encoding model across predictability-matched conditions, we evaluate how linguistic manipulations modulate the alignment between human EEG reading data and contextual embeddings extracted word-by-word from the hidden layers of GPT2-XL. Three key patterns emerge: (1) increased alignment for word sequences with syntactic structure, (2) decreased alignment for sentences with compositional semantics, and (3) associative semantics does not modulate alignment. These observed linguistic modulations of LLM-EEG alignment take place above and beyond predictability. Our results indicate that associative semantics is encoded similarly by LLMs and the brain, as are at least some aspects of syntactic structure, while compositional semantics is more uniquely encoded in the human brain.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 0.1%
33.0%
2
eLife
5422 papers in training set
Top 5%
10.4%
3
The Journal of Neuroscience
928 papers in training set
Top 2%
8.4%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 33%
4.9%
5
Neuron
282 papers in training set
Top 3%
4.9%
6
Nature Human Behaviour
85 papers in training set
Top 0.6%
4.3%
7
PLOS Computational Biology
1633 papers in training set
Top 9%
4.0%
8
eneuro
389 papers in training set
Top 4%
2.6%
9
Scientific Reports
3102 papers in training set
Top 49%
2.1%
10
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 2%
2.1%
11
Neurobiology of Language
28 papers in training set
Top 0.1%
2.1%
12
PLOS Biology
408 papers in training set
Top 9%
1.7%
13
NeuroImage
813 papers in training set
Top 4%
1.7%
14
Cell Reports
1338 papers in training set
Top 25%
1.7%
15
Cognition
44 papers in training set
Top 0.3%
1.5%
16
Science Advances
1098 papers in training set
Top 22%
1.3%
17
Current Biology
596 papers in training set
Top 12%
1.1%
18
Nature Neuroscience
216 papers in training set
Top 5%
1.0%
19
Communications Biology
886 papers in training set
Top 19%
0.9%
20
Journal of Cognitive Neuroscience
119 papers in training set
Top 1%
0.8%
21
Advanced Science
249 papers in training set
Top 20%
0.7%
22
Brain
154 papers in training set
Top 5%
0.7%
23
iScience
1063 papers in training set
Top 37%
0.6%
24
Cerebral Cortex
357 papers in training set
Top 2%
0.6%