Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/97877
PIRA download icon_1.1View/Download Full Text
Title: Decoding word embeddings with brain-based semantic features
Authors: Chersoni, E 
Santus, E
Huang, CR 
Lenci, A
Issue Date: Sep-2021
Source: Computational linguistics, Sept. 2021, v. 47, no. 3, p. 663-698
Abstract: Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings, including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT).
In our analysis, we first evaluate the quality of the mapping in a retrieval task, then we shed light on the semantic features that are better encoded in each embedding type. A large number of probing tasks is finally set to assess how the original and the mapped embeddings perform in discriminating semantic categories. For each probing task, we identify the most relevant semantic features and we show that there is a correlation between the embedding performance and how they encode those features. This study sets itself as a step forward in understanding which aspects of meaning are captured by vector spaces, by proposing a new and simple method to carve humaninterpretable semantic representations from distributional vectors.
Publisher: MIT Press
Journal: Computational linguistics 
ISSN: 0891-2017
EISSN: 1530-9312
DOI: 10.1162/coli_a_00412
Rights: © 2021 Association for Computational Linguistics
Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license
The following publication Emmanuele Chersoni, Enrico Santus, Chu-Ren Huang, Alessandro Lenci; Decoding Word Embeddings with Brain-Based Semantic Features. Computational Linguistics 2021; 47 (3): 663–698 is available at https://doi.org/10.1162/coli_a_00412.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
2021.cl-3.20.pdf809.91 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

128
Last Week
1
Last month
Citations as of Nov 10, 2025

Downloads

48
Citations as of Nov 10, 2025

SCOPUSTM   
Citations

22
Citations as of Jun 21, 2024

WEB OF SCIENCETM
Citations

29
Citations as of Dec 18, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.