Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/97934
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Computing | - |
| dc.creator | Xiang, Wangmeng | - |
| dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/12263 | - |
| dc.language.iso | English | - |
| dc.title | Towards efficient and reliable human activity understanding | - |
| dc.type | Thesis | - |
| dcterms.abstract | Human activity understanding has been an active research area due to its wide range of applications, e.g., sports analysis, healthcare, security monitoring, environment protection, entertainment, self-driving vehicle and human-computer interaction. Generally speaking, understanding of human activities requires us to answer "who (person re-identification) is doing what (action recognition)". In this thesis, we aim to investigate efficient and reliable methodologies for person re-identification and action recognition. | - |
| dcterms.abstract | In order to reliably recognize human identity, in chapter 2, we propose a novel Part-aware Attention Network (PAN) for person re-identification by using part feature maps as queries to perform second-order information propagation from middle-level features. PAN operates on all spatial positions of feature maps so that it can discover long-distance relations. Considering that hard negative samples have huge impact on action recognition performance, in chapter 3 we propose a Common Daily Action Dataset (CDAD), which contains positive and negative action pairs for reliable daily action understanding. The established CDAD dataset could not only serve as a benchmark for several important daily action understanding tasks, including multi-label action recognition, temporal action localization and spatial-temporal action detection, but also provide a testbed for researchers to investigate the influence of highly similar negative samples in learning action understanding models. | - |
| dcterms.abstract | How to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformer based action recognition. In chapter 4, we propose Temporal Patch Shift (TPS) for efficient spatiotemporal self-attention modeling, which largely increase the temporal modeling ability of 2D transformer without additional computation cost. Previous skeleton-based action recognition methods are typically formulated as a classification task of one-hot labels without fully utilizing the semantic relations between actions. To fully explore the action prior knowledge contained in languages, in chapter 5 we propose Language Supervised Training (LST) for skeleton-based action recognition. More specifically, we take a large-scale language model as the knowledge engine to provide text descriptions for body parts' actions and apply a multi-modal training scheme to supervise the skeleton encoder for action representation learning. | - |
| dcterms.abstract | In summary, in this thesis we present three methods and one dataset for efficient and reliable human activity understanding. Among them, PAN uses part feature to aggregate information from mid-level feature of CNN for person re-identification; CDAD collects positive and negative action pairs for reliable action recognition; TPS applies patch shift operation for efficient spatial-temporal modeling in transformer for video action recognition; and LST deploys human part language description to guide skeleton-based action recognition. Extensive experiments demonstrate their efficiency and reliability for human activity understanding. | - |
| dcterms.accessRights | open access | - |
| dcterms.educationLevel | Ph.D. | - |
| dcterms.extent | xv, 146 pages : color illustrations | - |
| dcterms.issued | 2023 | - |
| dcterms.LCSH | Computer vision | - |
| dcterms.LCSH | Image analysis | - |
| dcterms.LCSH | Motion perception (Vision) | - |
| dcterms.LCSH | Pattern recognition systems | - |
| dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | - |
| Appears in Collections: | Thesis | |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/12263
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


