Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/105726
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Computing-
dc.creatorLi, M-
dc.creatorLong, Y-
dc.creatorQin, L-
dc.creatorLi, W-
dc.date.accessioned2024-04-15T07:36:15Z-
dc.date.available2024-04-15T07:36:15Z-
dc.identifier.isbn978-2-9517408-9-1-
dc.identifier.urihttp://hdl.handle.net/10397/105726-
dc.descriptionTenth International Conference on Language Resources and Evaluation (LREC'16), May 23-28, 2016, Portorož, Sloveniaen_US
dc.language.isoenen_US
dc.publisherAssociation for Computational Linguistics (ACL)en_US
dc.rightsCopyright by the European Language Resources Associationen_US
dc.rightsThe LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by/4.0/)en_US
dc.rightsThe following publication Minglei Li, Yunfei Long, Lu Qin, and Wenjie Li. 2016. Emotion Corpus Construction Based on Selection from Hashtags. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1845–1849, Portorož, Slovenia. European Language Resources Association (ELRA) is available at https://aclanthology.org/L16-1291/.en_US
dc.titleEmotion corpus construction based on selection from hashtagsen_US
dc.typeConference Paperen_US
dc.identifier.spage1845-
dc.identifier.epage1849-
dcterms.abstractThe availability of labelled corpus is of great importance for supervised learning in emotion classification tasks. Because it is time-consuming to manually label text, hashtags have been used as naturally annotated labels to obtain a large amount of labelled training data from microblog. However, natural hashtags contain too much noise for it to be used directly in learning algorithms. In this paper, we design a three-stage semi-automatic method to construct an emotion corpus from microblogs. Firstly, a lexicon based voting approach is used to verify the hashtag automatically. Secondly, a SVM based classifier is used to select the data whose natural labels are consistent with the predicted labels. Finally, the remaining data will be manually examined to filter out the noisy data. Out of about 48K filtered Chinese microblogs, 39k microblogs are selected to form the final corpus with the Kappa value reaching over 0.92 for the automatic parts and over 0.81 for the manual part. The proportion of automatic selection reaches 54.1%. Thus, the method can reduce about 44.5% of manual workload for acquiring quality data. Experiment on a classifier trained on this corpus shows that it achieves comparable results compared to the manually annotated NLP&CC2013 corpus.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIn Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), p. 1845-1849. Portorož, Slovenia : European Language Resources Association (ELRA), 2016-
dcterms.issued2016-
dc.relation.ispartofbookProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)-
dc.relation.conferenceInternational Conference on Language Resources and Evaluation [LREC],-
dc.description.validate202402 bcch-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberCOMP-1612en_US
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextNational Natural Science Foundation of China; The Hong Kong Polytechnic Universityen_US
dc.description.pubStatusPublisheden_US
dc.identifier.OPUS9609356en_US
dc.description.oaCategoryCCen_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
L16-1291.pdf859.85 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

13
Citations as of May 12, 2024

Downloads

2
Citations as of May 12, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.