Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/71520
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Chinese and Bilingual Studies-
dc.creatorZhang, X-
dc.date.accessioned2017-12-28T06:26:14Z-
dc.date.available2017-12-28T06:26:14Z-
dc.identifier.issn1003-0077en_US
dc.identifier.urihttp://hdl.handle.net/10397/71520-
dc.language.isozhen_US
dc.publisher中国中文信息学会 ; 北京信息工程学院en_US
dc.rights© 2015 中国学术期刊电子杂志出版社。本内容的使用仅限于教育、科研之目的。en_US
dc.rights© 2015 China Academic Journal Electronic Publishing House. It is to be used strictly for educational and research purposes.en_US
dc.subjectChinese charactersen_US
dc.subjectDuplicate encodingen_US
dc.subjectUnicodeen_US
dc.titleDuplicate encoding of Chinese charactersen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage144en_US
dc.identifier.epage150en_US
dc.identifier.volume29en_US
dc.identifier.issue4en_US
dcterms.abstract同一个字符拥有不同的计算机内部代码,这意味着有两个或两个以上字形在人的眼中是同一个字,而计算机却认为是不同的字。这种"人机看法不一致"会给语言信息处理带来混乱,导致信息检索不全,统计数字不准,字词分类排序不一致等情况。该文结合Unicode实例专题讨论当前计算机上存在的中文同形异码字问题,包括(a)私人造字公有化所形成的同形异码字,(b)兼容编码所形成的同形异码字,(c)建立专门的笔画部首表而形成的同形异码字,(d)半宽和全宽字形分别编码而造成的同形异码字等,并探讨解决问题的方法。-
dcterms.abstractA duplicate-encoded character is a character which has been assigned two or more code points in a coding system such as Unicode.When output in distinct codes,the glyphs of a duplicate-encoded character appear the same to human users,while in the computer,they are different characters.Such a human-computer inconsistency would cause confusion in language information processing,resulting in incomplete information retrieval,inaccurate statistic calculation,and inferior quality of data sorting and categorizing.This paper discusses duplicate encoding of Chinese characters in Unicode,MS Office and the WWW,including(a)duplicate encoding arising from new code assignment in the Unihan public area to characters already encoded in the private use area,(b)duplicate encoding caused by compatibility encoding,(c)duplicate encoding brought forward by building dedicated lists for CJK strokes and radicals,and(d)duplicate encoding of characters in half-width and full-width forms.Some effective solutions to the problems are also suggested.-
dcterms.accessRightsopen accessen_US
dcterms.alternative中文的同形异码字问题-
dcterms.bibliographicCitation中文信息学报 (Journal of Chinese information processing), 2015, v. 29, no. 4, p. 144-150-
dcterms.isPartOf中文信息学报 (Journal of Chinese information processing)-
dcterms.issued2015-
dc.identifier.rosgroupid2015000042-
dc.description.ros2015-2016 > Academic research: refereed > Publication in refereed journalen_US
dc.description.ros2015-2016 > Academic research: refereed > Publication in refereed journal-
dc.description.validatebcmaen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_IR/PIRAen_US
dc.description.pubStatusPublisheden_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
2015000042.pdf619.59 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

515
Last Week
4
Last month
Citations as of Apr 21, 2024

Downloads

466
Citations as of Apr 21, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.