Dissociable frequency regimes in human temporal cortex integrate facial and acoustic cues during natural speech
Li, J.; Bian, K.; Hao, X.; Wu, J.; Lu, J.; Li, Y.
Show abstract
Face-to-face communication relies on the seamless integration of visual and acoustic cues, yet the spatiotemporal principles governing how the human brain dynamically represents and combines these multisensory streams remain largely unresolved. To address this, we recorded high-density electrocorticography (ECoG) from eight participants perceiving matched audiovisual, audio-only, and video-only continuous natural Mandarin speech. Using time-frequency-resolved encoding models, we reveal complementary, frequency-dependent integration regimes across the temporal lobe. We show that the superior temporal gyrus (STG) implements a feature-selective, auditory-dominant strategy, utilizing visual input to selectively strengthen low-frequency representations of lip-reading kinematics. Conversely, the middle temporal gyrus (MTG) acts as a higher-order multisensory hub, employing a frequency-selective strategy to broadly integrate diverse facial and articulatory features. Crucially, we demonstrate that access to visual information during perception significantly improves the acoustic and lexical accuracy of neural speech decoding and re-synthesis, with the MTG driving the largest gains in linguistic intelligibility. These findings uncover the dissociable neural architectures supporting robust multisensory perception, providing critical mechanistic insights for the development of next-generation, multimodal brain-computer interfaces.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.