Human-like sequential sound-to-meaning transfer drives artificial speech comprehension
Zhang, S.; Li, S.; Yang, R.; Chen, G.; Tian, X.; Wang, Q.; Fang, F.
Show abstract
Artificial intelligence has reached a pivotal threshold. Multimodal large models can approach human-level speech comprehension by rapidly transforming sound into meaning. However, whether this process relies on human-like mechanisms remains unknown. Here, we compared the human brain with twelve speech language models (SLMs) using a phonology-semantics confusion paradigm. Stereo-electroencephalography revealed two mechanisms of phonology-to-semantics (P2S) transfer in the human brain: a local sequential transformation within specific neuronal populations, and a global cross-regional hierarchy of P2S representations. Only brain-model alignment in the local sequential manner predicted model performance. Correspondingly, targeted lesioning of local sequential P2S-transfer model units markedly impaired comprehension performance, while activation steering of these units improved performance. In addition, such local sequential P2S-transfer model units were identified across languages. Together, this study establishes local sequential P2S transformation as a fundamental computational principle shared across biological and artificial intelligence, offering a mechanistic bridge for future brain-inspired speech systems.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.