Emotion corpus construction based on selection from hashtags

Li, M; Long, Y; Qin, L; Li, W

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/105726

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	Li, M	-
dc.creator	Long, Y	-
dc.creator	Qin, L	-
dc.creator	Li, W	-
dc.date.accessioned	2024-04-15T07:36:15Z	-
dc.date.available	2024-04-15T07:36:15Z	-
dc.identifier.isbn	978-2-9517408-9-1	-
dc.identifier.uri	http://hdl.handle.net/10397/105726	-
dc.description	Tenth International Conference on Language Resources and Evaluation (LREC'16), May 23-28, 2016, Portorož, Slovenia	en_US
dc.language.iso	en	en_US
dc.publisher	Association for Computational Linguistics (ACL)	en_US
dc.rights	Copyright by the European Language Resources Association	en_US
dc.rights	The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by/4.0/)	en_US
dc.rights	The following publication Minglei Li, Yunfei Long, Lu Qin, and Wenjie Li. 2016. Emotion Corpus Construction Based on Selection from Hashtags. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1845–1849, Portorož, Slovenia. European Language Resources Association (ELRA) is available at https://aclanthology.org/L16-1291/.	en_US
dc.title	Emotion corpus construction based on selection from hashtags	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	1845	-
dc.identifier.epage	1849	-
dcterms.abstract	The availability of labelled corpus is of great importance for supervised learning in emotion classification tasks. Because it is time-consuming to manually label text, hashtags have been used as naturally annotated labels to obtain a large amount of labelled training data from microblog. However, natural hashtags contain too much noise for it to be used directly in learning algorithms. In this paper, we design a three-stage semi-automatic method to construct an emotion corpus from microblogs. Firstly, a lexicon based voting approach is used to verify the hashtag automatically. Secondly, a SVM based classifier is used to select the data whose natural labels are consistent with the predicted labels. Finally, the remaining data will be manually examined to filter out the noisy data. Out of about 48K filtered Chinese microblogs, 39k microblogs are selected to form the final corpus with the Kappa value reaching over 0.92 for the automatic parts and over 0.81 for the manual part. The proportion of automatic selection reaches 54.1%. Thus, the method can reduce about 44.5% of manual workload for acquiring quality data. Experiment on a classifier trained on this corpus shows that it achieves comparable results compared to the manually annotated NLP&CC2013 corpus.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), p. 1845-1849. Portorož, Slovenia : European Language Resources Association (ELRA), 2016	-
dcterms.issued	2016	-
dc.relation.ispartofbook	Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)	-
dc.relation.conference	International Conference on Language Resources and Evaluation [LREC],	-
dc.description.validate	202402 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	COMP-1612	en_US
dc.description.fundingSource	RGC	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	National Natural Science Foundation of China; The Hong Kong Polytechnic University	en_US
dc.description.pubStatus	Published	en_US
dc.identifier.OPUS	9609356	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
L16-1291.pdf		859.85 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Page views

140

Last Week
5

Last month

Citations as of Apr 12, 2026

Downloads

58

Citations as of Apr 12, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

Google ScholarTM

Google Scholar^TM