Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system

Li, Nga-ling Bavy

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86109

Title:	Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system
Authors:	Li, Nga-ling Bavy
Degree:	M.Phil.
Issue Date:	2001
Abstract:	As computer technology increasingly permeates our daily lives, hundreds of speech recognition applications are being implemented and run in business, industry and customer services areas. Hong Kong is a multicultural city, which allows people to use their native tongues to communicate within the same group, to support the three common dialects of Cantonese, Mandarin and English. In this thesis, it was aimed to build an integrated Automatic Speech Recognition (ASR) system for the three mentioned dialects without applying any prior knowledge of linguistic information. For constructing our speech recognition system, (1) Speech Segmentation, (2) Speech Preprocessing, and (3) Speech Recognition are the three essential phases to study in this thesis. The objectives of the thesis include: (1) Finding a segmentation algorithm good for all three different dialects without any prior linguistic knowledge of any of them. (2) Using different existing parametric representations to produce different ranges of improvement on different speech recognition mechanisms for the three dialects. (3) Designing an integrated ASR system which would produce better results across the three dialects. In this thesis, the overall performance of our proposed segmentation algorithm and our proposed recognition algorithm were also measured through comparison with some common existing algorithms. From our experimental results, our proposed Linguistically Free Segmentation (LFS) method is shown to be much more stable than the traditional Zero Crossing method by considering their standard deviation. It is also shown that different existing parametric representations give varied ranges of improvement on different speech recognition mechanisms for the three dialects. In this thesis, the best performance for recognizing Cantonese can be achieved by applying Mel-frequency Cepstral Coefficients (MFCCs) features into Improved Naive Bayesian Classification (INBC), whereas the best performance for recognizing Mandarin and English can be achieved by applying MFCCs features into Hidden Markov Modeling (HMM) with Viterbi algorithm. From the results, it is indicated that an integrated ASR system (the composition of different algorithms from segmentation, preprocessing, and recognition phases) is needed for constructing a reliable speech understanding system for different kinds of spoken-languages in the society. Finally, such integrated ASR system for the three studied dialects was followed by the use of a Zoological Fortune Telling application. We believe that the development of an integrated ASR system can be applied for a Voice Response System, which can provide smart support for millions of business transaction or enquiry customer service everyday. Such system can improve traditional human-computer interactions by permitting users to retrieve or manipulate different forms (speech, text, graphics, or set of actions) of output from applications. This part will be set as the future enhancement of our integrated ASR system and will not be emphasized in this thesis.
Subjects:	Automatic speech recognition Speech processing systems Hong Kong Polytechnic University -- Dissertations
Pages:	viii, 91 leaves : ill. ; 30 cm
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/4905

Show full item record

Page views

258

Last Week
3

Last month

Citations as of Dec 21, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM