Please use this identifier to cite or link to this item:
DC FieldValueLanguage
dc.contributorDepartment of Computing-
dc.creatorCai, Sijia-
dc.titleLearning discriminative models and representations for visual recognition-
dcterms.abstractIn the past decade, visual recognition systems have witnessed major advances that led to record performances on challenging datasets. However, designing effective recognition algorithms that exhibit robustness to the sizeable extrinsic variability of visual data, particularly when the available training data are insufficient to learn accurate models, is a signifcant challenge. In this thesis, we focus on designing effective models and representations for visual recognition, via exploiting the characteristics of visual data and vision problems and taking advantages of classic sparse models and state-of-the-art deep neural networks. The first part of this thesis is dedicated to providing a probabilistic interpretation for general sparse/collaborative representation based classifcation. With a series of probabilistic modelling for sample-to-sample and sample-to-subspace, we present a probabilistic collaborative representation based classifer (ProCRC) that not only reveals the inner relationship between the coding and classifcation stages in original framework, but also achieves superior performance on a variety of challenging visual datasets when coupled with the convolutional neural network (CNN) features. We then facilitate the inherent difficulties in detecting parts and estimating appearance for fine-grained visual categorization (FGVC) problem, we consider the semantic properties of CNN activations and propose an end-to-end architecture based on kernel learning scheme to capture the higher-order statistics of convolutional activations for modelling part interaction. The proposed approach yields more discriminative representation and achieves competitive results on the widely used FGVC datasets even without part annotation. We also consider weakly-supervised learning of web videos to alleviate the data scarcity issue for video summarization. This is motivated by the fact that the publicly available datasets for video summarization remain limited in size and diversity, making most supervised approaches difficult in learning reliable summarization models. We investigate a generative summarization model via extending the variational autoencoder framework to accept both the benchmark videos and a large number of web videos. A variational encoder-summarizer-decoder (VESD) is proposed to identify the important segments of raw video using attention mechanism and semantic matching with web video. In this way, our VESD provides a practical solution for real-world video summarization. We further incorporate sparse models into deep architectures as structured modelling in learning powerful representations from datasets of limited size. The proposed DCSR-Net transforms a discriminative centralized sparse representation (DCSR) model into a learnable feed-forward network which can automatically impose the discriminative structure in data representations. Experiments indicate that DCSR-Net can be regarded as a general and effective module in learning structured representations.-
dcterms.accessRightsopen access-
dcterms.extentxviii, 131 pages : color illustrations-
dcterms.LCSHHong Kong Polytechnic University -- Dissertations-
dcterms.LCSHComputer vision-
dcterms.LCSHImage processing-
dcterms.LCSHVisual perception-
Appears in Collections:Thesis
Show simple item record

Page views

Last Week
Last month
Citations as of Sep 24, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.