Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/97934
DC FieldValueLanguage
dc.contributorDepartment of Computing-
dc.creatorXiang, Wangmeng-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/12263-
dc.language.isoEnglish-
dc.titleTowards efficient and reliable human activity understanding-
dc.typeThesis-
dcterms.abstractHuman activity understanding has been an active research area due to its wide range of applications, e.g., sports analysis, healthcare, security monitoring, environment protection, entertainment, self-driving vehicle and human-computer interaction. Generally speaking, understanding of human activities requires us to answer "who (person re-identification) is doing what (action recognition)". In this thesis, we aim to investigate efficient and reliable methodologies for person re-identification and action recognition.-
dcterms.abstractIn order to reliably recognize human identity, in chapter 2, we propose a novel Part-aware Attention Network (PAN) for person re-identification by using part feature maps as queries to perform second-order information propagation from middle-level features. PAN operates on all spatial positions of feature maps so that it can discover long-distance relations. Considering that hard negative samples have huge impact on action recognition performance, in chapter 3 we propose a Common Daily Action Dataset (CDAD), which contains positive and negative action pairs for reliable daily action understanding. The established CDAD dataset could not only serve as a benchmark for several important daily action understanding tasks, including multi-label action recognition, temporal action localization and spatial-temporal action detection, but also provide a testbed for researchers to investigate the influence of highly similar negative samples in learning action understanding models.-
dcterms.abstractHow to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformer based action recognition. In chapter 4, we propose Temporal Patch Shift (TPS) for efficient spatiotemporal self-attention modeling, which largely increase the temporal modeling ability of 2D transformer without additional computation cost. Previous skeleton-based action recognition methods are typically formulated as a classification task of one-hot labels without fully utilizing the semantic relations between actions. To fully explore the action prior knowledge contained in languages, in chapter 5 we propose Language Supervised Training (LST) for skeleton-based action recognition. More specifically, we take a large-scale language model as the knowledge engine to provide text descriptions for body parts' actions and apply a multi-modal training scheme to supervise the skeleton encoder for action representation learning.-
dcterms.abstractIn summary, in this thesis we present three methods and one dataset for efficient and reliable human activity understanding. Among them, PAN uses part feature to aggregate information from mid-level feature of CNN for person re-identification; CDAD collects positive and negative action pairs for reliable action recognition; TPS applies patch shift operation for efficient spatial-temporal modeling in transformer for video action recognition; and LST deploys human part language description to guide skeleton-based action recognition. Extensive experiments demonstrate their efficiency and reliability for human activity understanding.-
dcterms.accessRightsopen access-
dcterms.educationLevelPh.D.-
dcterms.extentxv, 146 pages : color illustrations-
dcterms.issued2023-
dcterms.LCSHComputer vision-
dcterms.LCSHImage analysis-
dcterms.LCSHMotion perception (Vision)-
dcterms.LCSHPattern recognition systems-
dcterms.LCSHHong Kong Polytechnic University -- Dissertations-
Appears in Collections:Thesis
Show simple item record

Page views

156
Last Week
6
Last month
Citations as of Nov 30, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.