Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/67953
Title: EVALution-MAN : a Chinese dataset for the training and evaluation of DSMs
Authors: Liu, H
Neergaard, KD
Santus, E
Huang, CR 
Keywords: Dataset
Distributioanl semantic models
Tarining
Evaluating
Issue Date: 2016
Source: The 10th Language Resources and Evaluation Conference, Portorož, Slovenia, 23-28 May 2016, p.4583-4587 How to cite?
Abstract: Distributional semantic models (DSMs) are currently being used in the measurement of word relatedness and word similarity. One shortcoming of DSMs is that they do not provide a principled way to discriminate different semantic relations. Several approaches have been adopted that rely on annotated data either in the training of the model or later in its evaluation. In this paper, we introduce a dataset for training and evaluating DSMs on semantic relations discrimination between words, in Mandarin, Chinese. The construction of the dataset followed EVALution 1.0, which is an English dataset for the training and evaluating of DSMs. The dataset contains 360 relation pairs, distributed in five different semantic relations, including antonymy, synonymy, hypernymy, meronymy and nearsynonymy. All relation pairs were checked manually to estimate their quality. In the 360 word relation pairs, there are 373 relata. They were all extracted and subsequently manually tagged according to their semantic type. The relatas’frequency was calculated in a combined corpus of Sinica and Chinese Gigaword. To the best of our knowledge, EVALution-MAN is the first of its kind for Mandarin, Chinese.
URI: http://hdl.handle.net/10397/67953
Appears in Collections:Conference Paper

Show full item record

Page view(s)

1
Last Week
0
Last month
Checked on Aug 20, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.