Back

Understanding and Improving Word Embeddings through a Neuroscientific Lens

Fereidooni, S.; Mocz, V.; Radev, D.; Chun, M.

2020-09-20 neuroscience
10.1101/2020.09.18.304436 bioRxiv
Show abstract

Despite the success of models making use of word embeddings on many natural language tasks, these models often perform significantly worse than humans on several natural language understanding tasks. This difference in performance motivates us to ask: (1) if existing word vector representations have any basis in the brains representational structure for individual words, and (2) whether features from the brain can be used to improve word embedding model performance, defined as their correlation with human semantic judgements. To answer the first question, we compare the representational spaces of existing word embedding models with that of brain imaging data through representational similarity analysis. We answer the second question by using regression-based learning to constrain word vectors to the features of the brain imaging data, thereby determining if these modified word vectors exhibit increased performance over their unmodified counterparts. To collect semantic judgements as a measure of performance, we employed a novel multi-arrangement method. Our results show that there is variance in the representational space of the brain imaging data that remains uncaptured by word embedding models, and that brain imaging data can be used to increase their coherence with human performance.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 2%
12.5%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 5%
10.1%
3
Neurobiology of Language
28 papers in training set
Top 0.1%
8.4%
4
Scientific Reports
3102 papers in training set
Top 19%
6.3%
5
Cognition
44 papers in training set
Top 0.1%
3.6%
6
Nature Human Behaviour
85 papers in training set
Top 0.9%
3.6%
7
Frontiers in Computational Neuroscience
53 papers in training set
Top 0.7%
3.6%
8
PLOS ONE
4510 papers in training set
Top 45%
2.6%
50% of probability mass above
9
NeuroImage
813 papers in training set
Top 3%
2.1%
10
Communications Psychology
20 papers in training set
Top 0.1%
2.1%
11
Communications Biology
886 papers in training set
Top 5%
2.1%
12
Nature Communications
4913 papers in training set
Top 48%
1.9%
13
Frontiers in Neuroscience
223 papers in training set
Top 3%
1.9%
14
Imaging Neuroscience
242 papers in training set
Top 2%
1.9%
15
Neural Networks
32 papers in training set
Top 0.4%
1.8%
16
Journal of Cognitive Neuroscience
119 papers in training set
Top 0.8%
1.7%
17
eneuro
389 papers in training set
Top 5%
1.7%
18
Journal of Neural Engineering
197 papers in training set
Top 1%
1.7%
19
Journal of Neurophysiology
263 papers in training set
Top 0.5%
1.5%
20
The Journal of Neuroscience
928 papers in training set
Top 6%
1.5%
21
Journal of Vision
92 papers in training set
Top 0.3%
1.3%
22
Human Brain Mapping
295 papers in training set
Top 3%
1.2%
23
eLife
5422 papers in training set
Top 50%
1.1%
24
Cerebral Cortex
357 papers in training set
Top 1%
1.0%
25
iScience
1063 papers in training set
Top 25%
1.0%
26
Heliyon
146 papers in training set
Top 6%
0.8%
27
Neural Computation
36 papers in training set
Top 0.7%
0.7%
28
Frontiers in Human Neuroscience
67 papers in training set
Top 3%
0.7%
29
Nature Neuroscience
216 papers in training set
Top 6%
0.7%
30
Nature Computational Science
50 papers in training set
Top 2%
0.7%