Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/14241
Title: Fast human action classification and VOI localization with enhanced sparse coding
Authors: Lu, S
Zhang, J
Wang, Z
Feng, DD
Keywords: Human action classification
Localization
Sparse coding
Volume of Interest (VOI)
Issue Date: 2013
Publisher: Academic Press
Source: Journal of visual communication and image representation, 2013, v. 24, no. 2, p. 127-136 How to cite?
Journal: Journal of visual communication and image representation 
Abstract: Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods.
URI: http://hdl.handle.net/10397/14241
ISSN: 1047-3203
DOI: 10.1016/j.jvcir.2012.07.008
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

5
Last Week
0
Last month
0
Citations as of Dec 6, 2017

WEB OF SCIENCETM
Citations

4
Last Week
0
Last month
0
Citations as of Dec 3, 2017

Page view(s)

46
Last Week
1
Last month
Checked on Dec 10, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.