Please use this identifier to cite or link to this item:
Title: Self-driven learning for large-scale object detection
Authors: Wang, Keze
Degree: Ph.D.
Issue Date: 2019
Abstract: Aiming at finding instances of real-world objects from images or video sequences, object detection has been attracting great interests in the research community of computer vision. Benefiting from the rapid advancement of deep convolutional neural networks (CNNs), remarkable progress has been achieved in object detection. Currently, most efforts have been spent on the design of powerful network architectures, e.g., resdual networks (ResNet) [38] and single shot multi-box detectors (SSD) [72], to improve feature learning and computation speed. However, existing object detection methods require massive data collection and annotation, which is quite expensive. Hence, how to leverage large-scale unlabeled data to improve detection performance is a crucial and long-standing problem in object detection. To address this issue, many active learning (AL) methods have been proposed, which retrieve a small amount of representative unlabeled samples for manual annotation. However, these AL methods ignore the remaining majority samples (i.e., those with low uncertainty or high prediction confidence). In this thesis, we aim to develop cost-effective method to mine samples from both majority and minority unlabeled samples, minimizing user annotation efforts to train more powerful object detectors. First, we naturally combine AL and self-paced learning (SPL) [57] to automatically pseudo-label the majority of high confidence samples and incorporate them into training with the weak expert re-certification strategy. Such an implementation can be formulated as solving a concise active SPL optimization problem, which advances the SPL development by supplementing it a rational dynamic curriculum constraint. The required number of annotated samples is significantly decreased without sacrificing the performance. A dramatic reduction of user effort is also achieved over other state-of-the-art AL techniques. In addition, the mixture of SPL and AL improves not only the classifier accuracy but also the robustness against noisy data.
Second, we present a principled self-supervised sample mining (SSM) scheme to account for the real challenges in object detection. Specifically, our SSM scheme concentrates on automatically discovering and pseudo-labeling reliable region proposals to enhance the object detector via cross image validation, i.e., pasting these proposals into different labeled images to comprehensively measure their scores under different image contexts. By resorting to SSM, we propose a new AL framework to gradually incorporate unlabeled or partially labeled data into the model learning while minimizing the annotation effort of users. Third, we develop a principled active sample mining (ASM) framework, which involves a selectively switchable sample selection mechanism to determine whether an unlabeled sample should be manually annotated via AL or automatically pseudo-labeled via a novel self-learning process. The proposed process is compatible with mini-batch based training (i.e., using a batch of unlabeled or partially labeled data as one-time input). Notably, our method is suitable to detect object categories that are not seen in the unlabeled data during the learning process. Lastly, we develop a novel memory network module named convolutional memory block (CMB), which empowers CNNs with the memory mechanism to enhance the pattern abstracting capability by reusing their rich implicit convolutional structures and spatial correlations among the non-sequential training samples. Specifically, the proposed CMB consists of one internal memory (i.e., a set of feature maps) and three specific controllers, which enable a powerful yet efficient memory manipulation mechanism. Our proposed CMB intends to capture and store the representative dependencies or correlations among training samples according to specific learning tasks, and further employ these stored dependencies to enhance the representation of convolutional layers. In this way, our CMB encourages the CNN architecture to be lightweight and require less training data. In summary, in this thesis we focus on exploiting large-scale unlabeled or partially labeled data incrementally to improve object detection performance. Extensive experiments on public benchmarks clearly demonstrate that our proposed approaches can achieve comparable performance to alternative methods but with significantly fewer annotations.
Subjects: Hong Kong Polytechnic University -- Dissertations
Image processing
Computer vision
Machine learning
Pages: xvi, 119 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

Citations as of May 29, 2022

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.