Back

Human brains implicitly and rapidly distinguish AI from human voices before decoding prosodic meaning

Chen, W.; Pell, M.; Jiang, X.

2026-04-09 neuroscience
10.64898/2026.04.08.716483 bioRxiv
Show abstract

People encounter AI voices daily. Existing behavioral studies suggest listeners rely on prosodic cues such as intonation and expressiveness to detect audio deepfakes, reporting that AI voices sound prosodically less rich than human voices. To test whether prosodic processing drives deepfake discrimination in the neural time course of voice processing, we recorded electroencephalographic (EEG) data while participants listened to human and AI-generated speakers producing utterances in confident vs. doubtful prosody (tone of voice), with attention directed toward memorizing speaker names. We used voice cloning to control for speaker identity confounds between human and AI voices. Multivariate pattern analysis revealed that neural discrimination of human vs. AI voices emerged rapidly regardless of prosody (confident: 176 ms; doubtful: 134 ms), substantially preceding prosody discrimination (confident vs. doubtful within human voices: 2066 ms; within AI voices: 1366 ms). Acoustic analysis confirmed that prosodic distinctions became classifiable only at utterance offset (90% normalized duration), converging with neural evidence that prosody requires near-complete temporal integration. This temporal dissociation between rapid voice source discrimination and late-emerging prosody decoding suggests that prosody plays a smaller role in audio deepfake detection than listeners retrospectively report. Representational similarity analysis further revealed that spectral envelope features (mel-frequency cepstral coefficients; MFCC), rather than the visually salient high-frequency energy differences, drove neural human-AI discrimination, with MFCCs earliest independent contribution (228 ms) closely following the MVPA decoding onset (134-176 ms). Future studies may manipulate specific acoustic components to establish the causal sources of this rapid and sustained neural discrimination. Significance StatementPeople encounter AI voices daily, in phone calls, navigation apps, supermarket checkouts, and subway announcements. Using electroencephalography, we show that the human brain automatically and rapidly distinguishes everyday AI voices from human speech, even without conscious attention to voice source. Although people may attribute this ability to AI voices sounding monotone or prosodically unnatural, the brain relies on subtler acoustic signatures, enabling discrimination before prosodic information becomes available. Attempts to identify the specific acoustic features driving this neural detection were inconclusive, pointing to the need for future causal investigations. We encourage engineers and policymakers to ensure AI voices remain perceptually detectable, as increasingly humanlike AI voices could cognitively disadvantage the general public if they become indistinguishable from human speech.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
eneuro
389 papers in training set
Top 0.1%
18.8%
2
The Journal of Neuroscience
928 papers in training set
Top 0.4%
18.7%
3
eLife
5422 papers in training set
Top 2%
14.8%
50% of probability mass above
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 7%
8.5%
5
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 1.0%
4.2%
6
PLOS Biology
408 papers in training set
Top 2%
4.0%
7
Current Biology
596 papers in training set
Top 6%
3.1%
8
Journal of Cognitive Neuroscience
119 papers in training set
Top 0.6%
2.7%
9
Scientific Reports
3102 papers in training set
Top 47%
2.5%
10
Neuron
282 papers in training set
Top 5%
2.4%
11
Nature Human Behaviour
85 papers in training set
Top 2%
2.1%
12
Cognition
44 papers in training set
Top 0.2%
1.9%
13
Nature Communications
4913 papers in training set
Top 56%
1.2%
14
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
15
NeuroImage
813 papers in training set
Top 5%
1.2%
16
Communications Biology
886 papers in training set
Top 16%
1.1%
17
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 5%
1.1%
18
Imaging Neuroscience
242 papers in training set
Top 3%
0.9%
19
Cerebral Cortex
357 papers in training set
Top 2%
0.8%
20
PLOS ONE
4510 papers in training set
Top 68%
0.8%
21
Nature Neuroscience
216 papers in training set
Top 7%
0.6%
22
Journal of Neurophysiology
263 papers in training set
Top 1%
0.5%
23
Neuropsychologia
77 papers in training set
Top 2%
0.5%