Video-based pattern recognition by spatio-temporal modeling via multi-modality co-learning

Zheng, Haomian

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/84697

Title:	Video-based pattern recognition by spatio-temporal modeling via multi-modality co-learning
Authors:	Zheng, Haomian
Degree:	Ph.D.
Issue Date:	2012
Abstract:	The rapid growth of online video content makes it a challenging task to analyze, understand and process video contentinreal time. Video pattern recognition is emerging as an important research topic in computer vision and communication. Real-time applications such as Internet video searching and video surveillance are popular nowadays. Therefore effective and fast processing approaches are highly demanded. Although the traditional pattern recognition techniques can solve problems for text and image with satisfactory performance, they are subject to certain limitations when processing video due to the large amount of data and time complexity. On the other hand, some statistic models have been proposed for some special video processing applications, however, they cannot handle the general video-based pattern recognition problem. In this thesis, we tackle these problems by addressing three key issues: feature extraction/video representation, indexing, and similarity measurement for classification. The feasibility of the proposed approaches is demonstrated through the experiments on audio-visual speaker identification, video action recognition and gesture recognition. Firstly we investigate the problem for video feature extraction and representation. Trajectories in high dimensional space are used to represent the video clip and global statistical features are extracted from the trajectory for classification. Based on such feature extraction, we propose two new approaches, Differential Luminance Field Trajectory (DLFT) and Luminance Aligned Projection Distance (LAPD) for the recognition task. For DLFT, we extract the differential signals as features, and then classify the action by supervised learning. For the LAPD approach, we define a new similarity measurement and compute a distance metric to describe the similarity between videos for classification. A potential fusion of the two methods yields more promising properties. Experimental results demonstrate the methods work effectively and efficiently. Secondly we extend our work by utilizing local spatio-temporal features via indexing. Local features generally contain more statistical information for discrimination. We deal with the spatio-temporal modeling by partitioning appearance space. The proposed approach can capture the discriminative information among different action classes. For trajectory matching solution, we develop a query-driven dynamic appearance modeling method and use localized subspaces to obtain more reliable distance for discrimination. Flexibility is also guaranteed by introducing a warping scheme. The processing is implemented based on an indexing scheme, which is very fast in computation. Simulation results demonstratethe effectiveness of the solution. Thirdly we focus on improving the pattern recognition performance by proposing novel learning methods. Consider the various features used for video representation, we target on utilizing multiple set of features to jointly solve the recognition problem. We propose a multi-modality distance metric co-learning method. Two set of different features are jointly utilized to generate a better description the video clips. In this way the similarity between video clips is better evaluated and the recognition accuracy is improved. The effectiveness of proposed method is proved by audio-visual speaker identification. Furthermore, to demonstrate the robustness, the proposed method is also applied on digit recognition and text classification. Experiment results show the proposed multi-modality result is better than single modality, together with other previous method in recognition accuracy.
Subjects:	Digital video. Image processing -- Digital techniques. Pattern recognition systems. Hong Kong Polytechnic University -- Dissertations
Pages:	xiv, 105 p. : ill. ; 30 cm.
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/6971

Show full item record

Page views

209

Last Week
8

Last month

Citations as of Jan 18, 2026

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM