EVALution-MAN : a Chinese dataset for the training and evaluation of DSMs

Liu, H; Neergaard, K; Santus, E; Huang, CR

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/98202

Title:	EVALution-MAN : a Chinese dataset for the training and evaluation of DSMs
Authors:	Liu, H Neergaard, K Santus, E Huang, CR
Issue Date:	May-2016
Source:	In N Calzolari, K Choukri, T Declerck, S Goggi, M Grobelnik, B Maegaard, J Mariani, H Mazo, A Moreno, J Odijk & S Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), p. 4583-4587. Portorož, Slovenia : European Language Resources Association (ELRA), 2016
Abstract:	Distributional semantic models (DSMs) are currently being used in the measurement of word relatedness and word similarity. One shortcoming of DSMs is that they do not provide a principled way to discriminate different semantic relations. Several approaches have been adopted that rely on annotated data either in the training of the model or later in its evaluation. In this paper, we introduce a dataset for training and evaluating DSMs on semantic relations discrimination between words, in Mandarin, Chinese. The construction of the dataset followed EVALution 1.0, which is an English dataset for the training and evaluating of DSMs. The dataset contains 360 relation pairs, distributed in five different semantic relations, including antonymy, synonymy, hypernymy, meronymy and nearsynonymy. All relation pairs were checked manually to estimate their quality. In the 360 word relation pairs, there are 373 relata. They were all extracted and subsequently manually tagged according to their semantic type. The relatas’frequency was calculated in a combined corpus of Sinica and Chinese Gigaword. To the best of our knowledge, EVALution-MAN is the first of its kind for Mandarin, Chinese.
Keywords:	Dataset Distributioanl Semantic Models Tarining Evaluating
Publisher:	European Language Resources Association (ELRA)
ISBN:	978-2-9517408-9-1
Description:	Tenth International Conference on Language Resources and Evaluation (LREC'16), May 23-28, 2016, Portorož, Slovenia
Rights:	Copyright by the European Language Resources Association The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) The following publication Liu Hongchao, Karl Neergaard, Enrico Santus, and Chu-Ren Huang. 2016. EVALution-MAN: A Chinese Dataset for the Training and Evaluation of DSMs. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4583–4587, Portorož, Slovenia. European Language Resources Association (ELRA) is available at https://aclanthology.org/L16-1726.
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Huang_Evalution-Man_Chinese_Dataset.pdf		146.48 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Page views

70

Citations as of May 11, 2025

Downloads

28

Citations as of May 11, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

Google ScholarTM

Google Scholar^TM