Back to results list
Please use this identifier to cite or link to this item:
|Title:||Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system||Authors:||Li, Nga-ling Bavy||Keywords:||Automatic speech recognition
Speech processing systems
Hong Kong Polytechnic University -- Dissertations
|Issue Date:||2001||Publisher:||The Hong Kong Polytechnic University||Abstract:||As computer technology increasingly permeates our daily lives, hundreds of speech recognition applications are being implemented and run in business, industry and customer services areas. Hong Kong is a multicultural city, which allows people to use their native tongues to communicate within the same group, to support the three common dialects of Cantonese, Mandarin and English. In this thesis, it was aimed to build an integrated Automatic Speech Recognition (ASR) system for the three mentioned dialects without applying any prior knowledge of linguistic information. For constructing our speech recognition system, (1) Speech Segmentation, (2) Speech Preprocessing, and (3) Speech Recognition are the three essential phases to study in this thesis. The objectives of the thesis include: (1) Finding a segmentation algorithm good for all three different dialects without any prior linguistic knowledge of any of them. (2) Using different existing parametric representations to produce different ranges of improvement on different speech recognition mechanisms for the three dialects. (3) Designing an integrated ASR system which would produce better results across the three dialects. In this thesis, the overall performance of our proposed segmentation algorithm and our proposed recognition algorithm were also measured through comparison with some common existing algorithms. From our experimental results, our proposed Linguistically Free Segmentation (LFS) method is shown to be much more stable than the traditional Zero Crossing method by considering their standard deviation. It is also shown that different existing parametric representations give varied ranges of improvement on different speech recognition mechanisms for the three dialects. In this thesis, the best performance for recognizing Cantonese can be achieved by applying Mel-frequency Cepstral Coefficients (MFCCs) features into Improved Naive Bayesian Classification (INBC), whereas the best performance for recognizing Mandarin and English can be achieved by applying MFCCs features into Hidden Markov Modeling (HMM) with Viterbi algorithm. From the results, it is indicated that an integrated ASR system (the composition of different algorithms from segmentation, preprocessing, and recognition phases) is needed for constructing a reliable speech understanding system for different kinds of spoken-languages in the society. Finally, such integrated ASR system for the three studied dialects was followed by the use of a Zoological Fortune Telling application. We believe that the development of an integrated ASR system can be applied for a Voice Response System, which can provide smart support for millions of business transaction or enquiry customer service everyday. Such system can improve traditional human-computer interactions by permitting users to retrieve or manipulate different forms (speech, text, graphics, or set of actions) of output from applications. This part will be set as the future enhancement of our integrated ASR system and will not be emphasized in this thesis.||Description:||viii, 91 leaves : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577M COMP 2001 Li
|URI:||http://hdl.handle.net/10397/3037||Rights:||All rights reserved.|
|Appears in Collections:||Thesis|
Show full item record
Files in This Item:
|b15995288_link.htm||For PolyU Users||162 B||HTML||View/Open|
|b15995288_ir.pdf||For All Users (Non-printable)||3.13 MB||Adobe PDF||View/Open|
Citations as of Mar 19, 2018
Citations as of Mar 19, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.