Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/65194
Title: Automatic word segmentation for spoken Cantonese
Authors: Fung, SYR 
Bigi, B
Keywords: Corpus
Segmentation
Automatic
Cantonese
Software
Issue Date: 2015
Publisher: Institute of Electrical and Electronics Engineers
Source: The 18th Oriental COCOSDA / CASLRE : 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) : proceedings : Shanghai Jiao Tong University, Shanghai, Oct. 28-30, 2015, p. 196-201 How to cite?
Abstract: Though Cantonese is the most influential variety of Chinese other than Mandarin, there are only a limited number of Cantonese corpora available for linguistic studies. Among the essential steps of building a corpus, word segmentation is a necessary but highly challenging task due to the lack of clear word boundary in Cantonese. This paper reports the construction and evaluation of an open-source automatic Cantonese word segmenter developed for Cantonese. The tool is a component of the multilingual SPPAS program designed to be used directly by linguists. It is a free software distributed under a GPL license. The effectiveness of the tool was evaluated by comparing the result of segmenting some samples of a spoken Cantonese corpus manually and automatically using the tool developed. High precision and recall were found in our study. Upon completion, the tool would definitely promote the development of more Cantonese corpora for language related studies.
URI: http://hdl.handle.net/10397/65194
ISBN: 978-1-4673-8279-3 (electronic)
978-1-4673-8278-6 (USB)
978-1-4673-8280-9 (print on demand(PoD))
DOI: 10.1109/ICSDA.2015.7357891
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

31
Last Week
2
Last month
Checked on Aug 13, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.