Human-like sequential sound-to-meaning transfer drives artificial speech comprehension

Zhang, S.; Li, S.; Yang, R.; Chen, G.; Tian, X.; Wang, Q.; Fang, F.

2026-05-15 neuroscience

10.64898/2026.05.13.723203 bioRxiv

Show abstract

Artificial intelligence has reached a pivotal threshold. Multimodal large models can approach human-level speech comprehension by rapidly transforming sound into meaning. However, whether this process relies on human-like mechanisms remains unknown. Here, we compared the human brain with twelve speech language models (SLMs) using a phonology-semantics confusion paradigm. Stereo-electroencephalography revealed two mechanisms of phonology-to-semantics (P2S) transfer in the human brain: a local sequential transformation within specific neuronal populations, and a global cross-regional hierarchy of P2S representations. Only brain-model alignment in the local sequential manner predicted model performance. Correspondingly, targeted lesioning of local sequential P2S-transfer model units markedly impaired comprehension performance, while activation steering of these units improved performance. In addition, such local sequential P2S-transfer model units were identified across languages. Together, this study establishes local sequential P2S transformation as a fundamental computational principle shared across biological and artificial intelligence, offering a mechanistic bridge for future brain-inspired speech systems.

Human-like sequential sound-to-meaning transfer drives artificial speech comprehension

Matching journals