Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/67952
Title: What a nerd! beating students and vector cosine in the ESL and TOEFL datasets
Authors: Santus, E
Lenci, A
Chiu, TS
Lu, Q 
Huang, CR 
Keywords: Vector space models
VSMs
Distributional semantic models
DSMs
Semantic relations
Words similarity
Issue Date: 2016
Source: The 10th Language Resources and Evaluation Conference, Portorož, Slovenia, 23-28 May 2016, p.4565-4572 How to cite?
Abstract: In this paper, we claim that Vector Cosine ― which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models ― can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that ― independently of the adopted parameters ― outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.
URI: http://hdl.handle.net/10397/67952
Appears in Collections:Conference Paper

Show full item record

Page view(s)

3
Last Week
0
Last month
Checked on Aug 20, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.