Please use this identifier to cite or link to this item:
Title: Hierarchical architectures and learning algorithms for multi-label image classification and scene categorization
Authors: Chen, Zenghai
Degree: Ph.D.
Issue Date: 2014
Abstract: This thesis presents hierarchical architectures and learning algorithms for multi-label image classification and scene categorization. Three main contributions are reported in the thesis. They include: (1) an adaptive recognition model based on neural networks for image annotation; (2) a hierarchical neural approach for multi-instance multi-label image classification; and (3) a hybrid holistic and object-based approach for scene categorization. In the first investigation, we propose an adaptive recognition model based on neural networks for annotating images. The Adaptive Recognition Model (ARM) consists of an adaptive ClassiFication Network (CFN) and a nonlinear Correlation Network (CLN). The adaptive CFN aims to annotate an image with labels, and the CLN is used to unveil the correlative information of labels for annotation refinement. Image annotation is carried out by an ARM in two stages. In the first stage, the features extracted from regions of the input images are fed to a CFN to produce classification labels. In the second stage, the CLN uses label correlations learnt from the training images to refine the classification result. The ARM works in a forward-propagating manner, resulting in high efficiency in image annotation. Furthermore, the computational time of an ARM is insensitive to the number of regions of the input image and the vocabulary size. In this thesis, the effect of label correlation in image annotation is, comprehensively, studied on a real image dataset and a synthetic image dataset. The exploitation of a controllable synthetic dataset helps to systematically study the function of label correlation and effectively analyze the performance of the ARM. Experimental results demonstrate the efficiency and effectiveness of the proposed ARM. In the second investigation, we propose a multi-instance multi-label algorithm based on hierarchical neural networks for image classification. Image annotation can be regarded as a multi-instance multi-label image classification problem. But different from the first investigation that requires regional ground truth for model training, the model proposed in the second investigation can be trained using the image ground truth only. In particular, the proposed model, termed Multi-Instance Multi-Label Neural Network (MIMLNN), consists of two stages of MultiLayer Perceptrons (MLPs). For multi-instance multi-label image classification, all the regional features are fed to the first-stage MLPs, with one MLP copy processing one image region. After that, the MLP in the second stage incorporates the outputs of the first-stage MLPs to produce the final labels for the input image. The first-stage MLP is expected to model the relationship between regions and labels, while the second-stage MLP aims at capturing the label correlation for classification refinement. The classical error Back-Propagation (BP) approach is adopted to tune the parameters of MIMLNN. In view of that the traditional gradient descent algorithm suffers from the long-term dependency problem, a refined BP algorithm named Rprop is extended to effectively train MIMLNN. Experiments are conducted on a synthetic dataset and the widely-used Corel dataset. Experimental results demonstrate the superior performance of MIMLNN by comparing with state-of-the-art algorithms for multi-instance multi-label image classification.
In the third investigation, we target at scene categorization (termed scene classification as well). We first employ a deep learning algorithm of a hierarchical architecture to classify scenes, and show that the deep learning algorithm is a promising holistic approach for scene classification. After that, we propose a hybrid holistic and object-based approach for scene classification. In particular, if the decisions made by holistic and object-based approaches are identical, the scene class agreed by them is selected as the final decision. Otherwise, a majority voting scheme is employed to make the final decision based on the results of all the classifiers of both holistic and object-based approaches. At the end of this thesis, a conclusion is drawn and three future research directions are pointed out. The three directions all concentrate on deep neural networks. In particular, the first direction is to explore an efficient image classification network based on MIMLNN using deep neural networks. In the second direction, we would like to label each image pixel, and then explore deep neural networks to obtain the scene class by learning from all the pixel labels. In the third direction, we would like to perform scene parsing using deep neural networks.
Subjects: Computer vision -- Mathematics.
Image processing.
Machine learning.
Hong Kong Polytechnic University -- Dissertations
Pages: xviii, 122 p. : ill. (some col.) ; 30 cm.
Appears in Collections:Thesis

Show full item record

Page views

Last Week
Last month
Citations as of May 28, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.