Neural network techniques for graffiti interpretation and speech recognition

Leung, Koon-fai

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86108

Title:	Neural network techniques for graffiti interpretation and speech recognition
Authors:	Leung, Koon-fai
Degree:	M.Phil.
Issue Date:	2004
Abstract:	This thesis explores the neural network classification techniques on an electronic book (eBook) reading device. Two areas of application are addressed: a graffiti interpreter and a Cantonese-speech recognizer. Different structures of neural networks and hybrid neural networks incorporating fuzzy sets are used to realize the applications. An eBook reading device enhances our reading environment with interactive and multimedia features. Input for this device is possibly made using a stylus on a touch-screen or voice through a microphone; practically, the former is a pattern recognition (graffiti interpretation) problem and the latter is a speech recognition problem. With graffiti interpretation, eBook users can take full advantage of the graffiti input to issue commands or input texts. The interpretation is done by the template matching technique. Two approaches are developed to realize the pattern recognition, which apply a self-structured neural network and a self-structure neural-fuzzy network. Improved from a 3-layer fully connected neural network/neural-fuzzy network, the self-structured network has a variable structure that adapts to the characteristics of the input patterns by incorporating link switches. By properly determining the states of the link-switches through training, the dummy links can be eliminated. Simulation results show that the self-structure network performs better than a fixed-structure network in terms of the network size. With a speech recognizer, eBook users can use natural speech to execute some functions of the eBook and enter characters whenever necessary. Four approaches are proposed to recognize Cantonese speech. Of them, three are feed-forward neural networks, and one is a recurrent neural network. As the first approach, the self-structured neural-fuzzy network used for graffiti interpretation is also applied to recognize Cantonese-speech commands. Then, a neural-fuzzy network and a neural network are modified by adding associative memory to provide the network parameters. In both of these approaches, the neural-fuzzy network/neural network effectively has variable parameters that change with respect to the input patterns. Thus, the leaning ability can be enhanced for the case if two feature vectors belong to the same class but sparsely distributed. Results will be given to demonstrate the improvement on recognition accuracy, network complexity and learning rate. A discussion on comparing the various approaches will also be given. By using a recurrent neural network, the sequential properties of the double-syllable Cantonese-digit can be modeled. The fourth approach therefore involves an associative memory for a recurrent neural network. Results will be given to demonstrate the merits of the proposed approach. A discussion on the comparison between the static approaches and the dynamic approach will also be given. In this thesis, all neural networks are trained by an improved genetic algorithm (GA). The details about this algorithm and its performance in some benchmark test functions will be given in the Appendix.
Subjects:	Hong Kong Polytechnic University -- Dissertations Neural networks (Computer science) Electronic books Pattern perception Speech perception
Pages:	xix, 121 leaves : ill. ; 30 cm
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/3053

Show full item record

Page views

49

Last Week
0

Last month

Citations as of Apr 21, 2024

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM