Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/89748
Title: Learning robust visual-semantic mapping for zero-shot learning
Authors: Guo, Jingcai
Degree: Ph.D.
Issue Date: 2021
Abstract: Zero-shot learning (ZSL) aims at recognizing unseen class examples (e.g., images) with knowledge transferred from seen classes. This is typically achieved by exploiting a semantic feature space shared by both seen and unseen classes, e.g., attributes or word vectors, as the bridge. In ZSL, the common practice is to train a mapping function between the visual and semantic feature spaces with labeled seen class examples. When inferring, given unseen class examples, the learned mapping function is reused to them and recognizes the class labels on some metrics among their semantic relations. However, the visual and semantic feature spaces are generally independent and exist in entirely different manifolds. Under such a paradigm, the ZSL models may easily suffer from the domain shift problem when constructing and reusing the mapping function, which becomes the major challenge in ZSL. In this thesis, we explore effective ways to mitigate the domain shift problem and learn a robust mapping function between the visual and semantic feature spaces. We focus on fully empowering the semantic feature space, which is one of the key building blocks of ZSL. First, we consider to adaptively adjust to rectify the semantic feature space for ZSL. Conventional ZSL models generally regard the semantic feature space unchangeable during training. However, it can be observed that when mapping visual features to semantic feature space, the obtained semantic features are usually overly concentrated. This deficiency affects models' ability to adapt and generalize to more unseen classes. As we know, the process of human beings understanding things is constantly improving. Similarly, we argue the semantic feature space also needs to be dynamically adjusted to accommodate more robust learning of mapping function. Specifically, the adjustment is conducted on both the class prototypes and global distribution during training. Moreover, we also propose to combine the adjustment with a cycle mapping to formulate the training to a more efficient framework that can not only rectify the semantic feature space but also speed up the training process.
Second, we consider to align the manifold structures between the visual and semantic feature spaces by expanding the semantic features. Compared to our first method which directly adjusts to rectify the semantic feature space, the expansion process is more conservative and soft. Specifically, we build upon an autoencoder-based model to generate several auxiliary semantic features combining with the previous ones, to expand the space. Additionally, the expansion is jointly guided by an embedded manifold extracted from the visual feature space, which retains its geometrical and structural information. By aligning the two feature spaces, the trained mapping function is more robust and well-matched that significantly mitigates the domain shift problem. Last, we consider to take a further step to explore and empower the correlation between the visual and semantic feature spaces in a more fine-grained perspective. Unlike most existing and our previous works, we decompose an image example into several parts and use an example-level graph-based model to measure and utilize certain relations among these parts. Taking advantage of recently developed graph neural networks, we further formulate the ZSL to a graph-to-semantic mapping problem, which can better exploit the visual and semantic correlation and the local substructure information in example. In summary, this thesis targets fully empowering the semantic feature space and design effective solutions to mitigate the domain shift problem and hence obtain a more robust visual-semantic mapping function for ZSL. Extensive experiments on various datasets demonstrate the effectiveness of our proposed methods.
Subjects: Machine learning
Artificial intelligence
Hong Kong Polytechnic University -- Dissertations
Pages: xix, 130 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

42
Last Week
0
Last month
Citations as of Apr 28, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.