Temporal Dissociation of Syntactic Disambiguation and Memory Retrieval during Sentence Processing: Naturalistic MEG Evidence from Interpretable Models
Dunagan, D.; Low, D. S.; Yue, S.; Meyer, L.; Hale, J.
Show abstract
Human sentence comprehension proceeds word-by-word, with prior research proposing two central sources of cognitive demand during incremental processing: forward-looking disambiguation of the incoming information stream, and backward-looking retrieval of information associated with previous words from working memory. Recent work has shown that Transformer-based language models successfully generate predictions about sentence processing load in human psycho- and neurolinguistic data by operationalizing disambiguation cost as next-token surprisal, and memory retrieval cost as normalized attention entropy (NAE). Such models, however, remain difficult to interpret as it is not well understood what factors play causally into the decision to assign a cost value to a given word in such artificial neural networks. Here, we present interpretable and cognitively grounded models of disambiguation and memory retrieval and evaluate their neural alignment and spatio-temporal correlates using human magnetoencephalography responses to naturalistic narrative speech. Multivariate temporal response function modeling demonstrates firstly that these human-bias-informed models fare equally well in accounting for observed human language processing data as their Transformer counterparts. This same modeling framework then suggests that surprisal and NAE temporally dissociate in the cortical language network -- surprisal being predictive of bilateral superior temporal gyrus and supramarginal gyrus activation [~]300-500 ms, and NAE being predictive of activity in the same regions, but later [~]750-850 ms. By demonstrating that interpretable neurocomputational models can achieve meaningful brain alignment while maintaining explanatory transparency, this work offers a methodological blueprint for bridging the gap between algorithmic theory and neural implementation.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.