Please use this identifier to cite or link to this item:
Title: Automatic word segmentation for spoken Cantonese
Authors: Fung, SYR 
Bigi, B
Keywords: Corpus
Issue Date: 2015
Publisher: Institute of Electrical and Electronics Engineers
Source: The 18th Oriental COCOSDA / CASLRE : 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) : proceedings : Shanghai Jiao Tong University, Shanghai, Oct. 28-30, 2015, p. 196-201 How to cite?
Abstract: Though Cantonese is the most influential variety of Chinese other than Mandarin, there are only a limited number of Cantonese corpora available for linguistic studies. Among the essential steps of building a corpus, word segmentation is a necessary but highly challenging task due to the lack of clear word boundary in Cantonese. This paper reports the construction and evaluation of an open-source automatic Cantonese word segmenter developed for Cantonese. The tool is a component of the multilingual SPPAS program designed to be used directly by linguists. It is a free software distributed under a GPL license. The effectiveness of the tool was evaluated by comparing the result of segmenting some samples of a spoken Cantonese corpus manually and automatically using the tool developed. High precision and recall were found in our study. Upon completion, the tool would definitely promote the development of more Cantonese corpora for language related studies.
ISBN: 978-1-4673-8279-3 (electronic)
978-1-4673-8278-6 (USB)
978-1-4673-8280-9 (print on demand(PoD))
DOI: 10.1109/ICSDA.2015.7357891
Appears in Collections:Conference Paper

View full-text via PolyU eLinks SFX Query
Show full item record


Last Week
Last month
Citations as of Sep 12, 2018

Page view(s)

Last Week
Last month
Citations as of Sep 17, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.