Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/98228
Title: Machine learning for human activity analysis and recognition
Authors: Liu, Tianshan
Degree: Ph.D.
Issue Date: 2023
Abstract: The analysis and the recognition of human activities in videos are crucial and fundamental topics in computer vision. With the development of machine-learning methods, especially the deep-learning-based techniques, and the emergence of large-scale data sets, remarkable improvements have been achieved on the performance of human activity recognition. However, most of the current research is devoted to analyzing single-person activities, captured from third-person views in trimmed videos. This hinders the existing approaches being deployed in some more complicated real-world scenarios, such as when the scene involves interactions between multiple persons, or the activities are recorded from first-person (egocentric) views, or only the raw long untrimmed videos are available. Thus, this thesis mainly focuses on investigating effective machine-learning-based models for addressing these challenging issues, which have arisen from four specific tasks, including egocentric activity recognition, group activity recognition, concurrent first and third-person activity recognition, and anomaly event detection in untrimmed videos.
First, the videos captured from first-person views usually contain frequent egomotion, cluttered background, and partial body-movement of the camera-wearer, which leads to the scarcity of useful information. Hence, it is vital to sequentially localize the relevant regions of human-object interactions for identifying the target motion patterns and active objects. This thesis proposes an enhanced attention-tracking method, to coherently capture fine-grained human-object interactions in video sequences without requiring extra frame-level annotations, thereby resulting in accurately recognizing egocentric activities.
Second, group activity in a scene generally involves complex interactions between multiple persons. Without knowing specific interaction patterns, it is challenging to model the hidden relationships among subjects from the video inputs. This thesis explores a visual-semantic graph neural network (VS-GNN), which aims to simultaneously exploit abundant visual modalities, and the semantic hierarchies from label space. By discovering the diverse relations between individuals and groups, the proposed VS-GNN contributes to the improvement of the performance of group activity recognition.
Third, this thesis investigates a novel task, i.e., concurrent first and third-person activity recognition (CFT-AR), which is essentially a hybrid scenario that has not been studied in previous works. A new activity data set, namely PolyU CFT Daily, was first created to facilitate the research on CFT-AR. This data set inherits the characteristics of egocentric videos and involves multiple persons in varied scenes, which poses unprecedented challenges. Then, a comprehensive solution is presented, which learns both holistic scene-level and local instance-level representations to provide sufficient discriminative patterns for recognizing both first and third-person activities.
Fourth, anomaly event detection (AED) aims to identify the snippets, involving anomalous activities or behaviors in a long untrimmed video. In particular, the weakly supervised (WS) setting is a promising pipeline for AED, as it solely utilizes cheap video-level labels, while significantly improving detection performance. Current WS-AED methods tend to employ multimodal inputs to guarantee the robustness of the detector, which highly rely on the availability of multiple modalities and are computationally expensive in processing long sequences. This thesis designs a privileged knowledge-distillation (KD) framework specifically for the WS-AED task, with the goal of training a lightweight yet effective unimodal detector.
Subjects: Human activity recognition
Machine learning
Hong Kong Polytechnic University -- Dissertations
Pages: xxxii, 175 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

166
Last Week
4
Last month
Citations as of Nov 30, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.