Deep learning models for human parsing and action recognition : architectural design, model compression and data augmentation

Jiang, Yalong

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/82907

Title:	Deep learning models for human parsing and action recognition : architectural design, model compression and data augmentation
Authors:	Jiang, Yalong
Degree:	Ph.D.
Issue Date:	2020
Abstract:	The methods for human parsing and action recognition have long been critical techniques in visually describing human behaviours. The recent developments in Convolutional Neural Networks (CNNs) have brought significant improvements to the tasks thanks to the availability of an increased amount of training data. In this study, I focus on three major problems which hinder the applications of deep learning models to human parsing and action recognition. Firstly, existing human parsing models suffer from incomplete feature representations which may lead to failures in some difficult cases. I propose two novel architectures with comprehensive feature representations to improve the robustness of models. The first architecture explores the relationship between human parsing and pose estimation. A module for pose estimation is integrated with a human parsing module to improve the performance under complex backgrounds and variances in human's poses. The second architecture adopts a CNN module for depth estimation which pre-processes input images for the segmentation module. It can improve the pixel classification near boundaries. The availability of abundant labelled data in pose estimation and depth estimation boosts the performance in human parsing. Secondly, the inappropriate capacity of a CNN model and insufficient training data both contribute to the failures in perceiving semantic information of detailed regions. A high-capacity model cannot generalize to the variations in human parsing and action recognition. In my work, three novel methods to reduce the complexity of convolutional layers are proposed. The first method applies orthogonal weight normalization for weight initialization. Performance is improved with complexity reduced. The second method adjusts the dependency among convolutional kernels by conducting principal component analysis on the kernels. The third method clusters the convolutional kernels in each layer based on the Euclidean distance and evaluates the contributions from different clusters by examining the changes in training and test accuracy upon removing the clusters. Higher computational efficiency and better performance can be achieved at the same time. This method can be applied to the models which are pretrained on other tasks. Besides model compression, I further propose a method to evaluate the complexity of a human parsing task. The variances in scales, locations and the consistency in predictions from different models are studied. Additionally, a layer-wise training scheme is proposed to better explore the potential of a CNN model. Thirdly, human parsing models are used for improving the robustness of action recognition models. I extend human parsing models to predict the correspondences between RGB images and the surface-based representations of human bodies. The predictions are used for determining the task-irrelevant content in inputs which increases the domain discrepancy. The proposed scheme reduces the discrepancy between the training data and the test data and improves the performance in action recognition. The above-mentioned methods are evaluated on the Pascal Person Part dataset and the Look into Person dataset for human parsing, the COCO dataset for pose estimation, the MegaDepth dataset for depth estimation, and the HMDB-51 dataset for action recognition.
Subjects:	Hong Kong Polytechnic University -- Dissertations Pattern recognition systems Human activity recognition Machine learning
Pages:	xxvii, 209 pages : color illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/10318

Show full item record

Page views

208

Last Week
6

Last month

Citations as of Nov 30, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM