Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114024
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Chinese and Bilingual Studiesen_US
dc.creatorQiu, Len_US
dc.creatorGuo, Sen_US
dc.creatorWong, TSen_US
dc.creatorChersoni, Een_US
dc.creatorLee, Jen_US
dc.creatorHuang, CRen_US
dc.date.accessioned2025-07-10T01:31:44Z-
dc.date.available2025-07-10T01:31:44Z-
dc.identifier.isbn979-8-89176-176-6en_US
dc.identifier.urihttp://hdl.handle.net/10397/114024-
dc.descriptionThird Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), Miami, Florida, USA, 15 November 2024en_US
dc.language.isoenen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.rights©2024 Association for Computational Linguisticsen_US
dc.rightsACL materials are Copyright © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.en_US
dc.rightsThe following publication Le Qiu, Shanyue Guo, Tak-Sum Wong, Emmanuele Chersoni, John Lee, and Chu-Ren Huang. 2024. CompLex-ZH: A New Dataset for Lexical Complexity Prediction in Mandarin and Cantonese. In Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), pages 20–26, Miami, Florida, USA. Association for Computational Linguistics is available at https://doi.org/10.18653/v1/2024.tsar-1.3.en_US
dc.titleComplex-ZH : a new dataset for lexical complexity prediction in Mandarin and Cantoneseen_US
dc.typeConference Paperen_US
dc.identifier.spage20en_US
dc.identifier.epage26en_US
dc.identifier.doi10.18653/v1/2024.tsar-1.3en_US
dcterms.abstractThe prediction of lexical complexity in context is assuming an increasing relevance in Natural Language Processing research, since identifying complex words is often the first step of text simplification pipelines. To the best of our knowledge, though, datasets annotated with complex words are available only for English and for a limited number of Western languages.In our paper, we introduce CompLex-ZH, a dataset including words annotated with complexity scores in sentential contexts for Chinese. Our data include sentences in Mandarin and Cantonese, which were selected from a variety of sources and textual genres. We provide a first evaluation with baselines combining hand-crafted and language models-based features.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIn Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), p. 20–26. Miami, Florida, USA: Association for Computational Linguistics, 2024en_US
dcterms.issued2024-
dc.relation.ispartofbookProceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)en_US
dc.relation.conferenceWorkshop on Text Simplification, Accessibility and Readability [TSAR]en_US
dc.description.validate202507 bcwhen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera3877-
dc.identifier.SubFormID51498-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextFaculty of Humanities of the Hong Kong Polytechnic Universityen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
dc.relation.rdatahttps://github.com/Laniqiu/CompLex-ZHen_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
2024.tsar-1.3.pdf432.26 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

174
Citations as of Feb 9, 2026

Downloads

67
Citations as of Feb 9, 2026

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.