Please use this identifier to cite or link to this item:
Title: Building a collation element table for a large Chinese character set in YES
Authors: Zhang, X 
Li, X
Keywords: Chinese characters
Issue Date: 2015
Publisher: Springer
Source: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics) How to cite?
Journal: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics) 
Abstract: YES is a simplified stroke-based method for sorting Chinese characters. It is free from stroke counting and grouping, and thus much faster and more accurate than the traditional method. This paper presents a collation element table built in YES for a large joint Chinese character set covering (a) all 20,902 characters of Unicode CJK Unified Ideographs, (b) all 11,408 characters in the Complete List of Chinese Characters Used by the Media in 2013, (c) all 13,000 plus characters in the latest versions of Xinhua Dictionary(v11) and Contemporary Chinese Dictionary(v6). Of the 20,902 Chinese characters in Unicode, 97.23% have one-to-one relationship with their stroke order codes in YES, comparing with 90.69% of the traditional method. Enhanced with the secondary and tertiary sorting levels of stroke layout and Unicode value, there is a guarantee of one-to-one relationship between the characters and collation elements. The collation element table has been successfully applied to sorting CC-CEDICT, a Chinese-English dictionary of over 112,000 word entries.
Description: 14th China National Conference, CCL 2015, and third International Symposium, NLP-NABD 2015
ISBN: 9783319258157
ISSN: 0302-9743
EISSN: 1611-3349
DOI: 10.1007/978-3-319-25816-4_1
Appears in Collections:Conference Paper

View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

Last Week
Last month
Citations as of Nov 12, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.