Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/114024
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Chinese and Bilingual Studies | - |
dc.creator | Qiu, L | - |
dc.creator | Guo, S | - |
dc.creator | Wong, TS | - |
dc.creator | Chersoni, E | - |
dc.creator | Lee, J | - |
dc.creator | Huang, CR | - |
dc.date.accessioned | 2025-07-10T01:31:44Z | - |
dc.date.available | 2025-07-10T01:31:44Z | - |
dc.identifier.isbn | 979-8-89176-176-6 | - |
dc.identifier.uri | http://hdl.handle.net/10397/114024 | - |
dc.description | Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), Miami, Florida, USA, 15 November 2024 | en_US |
dc.language.iso | en | en_US |
dc.publisher | Association for Computational Linguistics | en_US |
dc.rights | ©2024 Association for Computational Linguistics | en_US |
dc.rights | ACL materials are Copyright © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. | en_US |
dc.rights | The following publication Le Qiu, Shanyue Guo, Tak-Sum Wong, Emmanuele Chersoni, John Lee, and Chu-Ren Huang. 2024. CompLex-ZH: A New Dataset for Lexical Complexity Prediction in Mandarin and Cantonese. In Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), pages 20–26, Miami, Florida, USA. Association for Computational Linguistics is available at https://doi.org/10.18653/v1/2024.tsar-1.3. | en_US |
dc.title | Complex-ZH : a new dataset for lexical complexity prediction in Mandarin and Cantonese | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 20 | - |
dc.identifier.epage | 26 | - |
dc.identifier.doi | 10.18653/v1/2024.tsar-1.3 | - |
dcterms.abstract | The prediction of lexical complexity in context is assuming an increasing relevance in Natural Language Processing research, since identifying complex words is often the first step of text simplification pipelines. To the best of our knowledge, though, datasets annotated with complex words are available only for English and for a limited number of Western languages.In our paper, we introduce CompLex-ZH, a dataset including words annotated with complexity scores in sentential contexts for Chinese. Our data include sentences in Mandarin and Cantonese, which were selected from a variety of sources and textual genres. We provide a first evaluation with baselines combining hand-crafted and language models-based features. | - |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | In Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), p. 20–26. Miami, Florida, USA: Association for Computational Linguistics, 2024 | - |
dcterms.issued | 2024 | - |
dc.relation.ispartofbook | Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024) | - |
dc.relation.conference | Workshop on Text Simplification, Accessibility and Readability [TSAR] | - |
dc.description.validate | 202507 bcwh | - |
dc.description.oa | Version of Record | en_US |
dc.identifier.FolderNumber | a3877 | en_US |
dc.identifier.SubFormID | 51498 | en_US |
dc.description.fundingSource | Others | en_US |
dc.description.fundingText | Faculty of Humanities of the Hong Kong Polytechnic University | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | CC | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2024.tsar-1.3.pdf | 432.26 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.