Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system

Li, Nga-ling Bavy

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86109

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	Li, Nga-ling Bavy	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/4905	-
dc.language.iso	English	-
dc.title	Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system	-
dc.type	Thesis	-
dcterms.abstract	As computer technology increasingly permeates our daily lives, hundreds of speech recognition applications are being implemented and run in business, industry and customer services areas. Hong Kong is a multicultural city, which allows people to use their native tongues to communicate within the same group, to support the three common dialects of Cantonese, Mandarin and English. In this thesis, it was aimed to build an integrated Automatic Speech Recognition (ASR) system for the three mentioned dialects without applying any prior knowledge of linguistic information. For constructing our speech recognition system, (1) Speech Segmentation, (2) Speech Preprocessing, and (3) Speech Recognition are the three essential phases to study in this thesis. The objectives of the thesis include: (1) Finding a segmentation algorithm good for all three different dialects without any prior linguistic knowledge of any of them. (2) Using different existing parametric representations to produce different ranges of improvement on different speech recognition mechanisms for the three dialects. (3) Designing an integrated ASR system which would produce better results across the three dialects. In this thesis, the overall performance of our proposed segmentation algorithm and our proposed recognition algorithm were also measured through comparison with some common existing algorithms. From our experimental results, our proposed Linguistically Free Segmentation (LFS) method is shown to be much more stable than the traditional Zero Crossing method by considering their standard deviation. It is also shown that different existing parametric representations give varied ranges of improvement on different speech recognition mechanisms for the three dialects. In this thesis, the best performance for recognizing Cantonese can be achieved by applying Mel-frequency Cepstral Coefficients (MFCCs) features into Improved Naive Bayesian Classification (INBC), whereas the best performance for recognizing Mandarin and English can be achieved by applying MFCCs features into Hidden Markov Modeling (HMM) with Viterbi algorithm. From the results, it is indicated that an integrated ASR system (the composition of different algorithms from segmentation, preprocessing, and recognition phases) is needed for constructing a reliable speech understanding system for different kinds of spoken-languages in the society. Finally, such integrated ASR system for the three studied dialects was followed by the use of a Zoological Fortune Telling application. We believe that the development of an integrated ASR system can be applied for a Voice Response System, which can provide smart support for millions of business transaction or enquiry customer service everyday. Such system can improve traditional human-computer interactions by permitting users to retrieve or manipulate different forms (speech, text, graphics, or set of actions) of output from applications. This part will be set as the future enhancement of our integrated ASR system and will not be emphasized in this thesis.	-
dcterms.accessRights	open access	-
dcterms.educationLevel	M.Phil.	-
dcterms.extent	viii, 91 leaves : ill. ; 30 cm	-
dcterms.issued	2001	-
dcterms.LCSH	Automatic speech recognition	-
dcterms.LCSH	Speech processing systems	-
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	-
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/4905

Show simple item record

Page views

300

Last Week
0

Last month

Citations as of Apr 12, 2026

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM