Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109469
Title: Label and computation-efficient deep segmentation for images and point clouds
Authors: Li, Ruihuang
Degree: Ph.D.
Issue Date: 2024
Abstract: Segmentation is an essential task in computer vision, which aims to divide an im­age or point cloud into several disjoint sets of pixels or points that correspond to different objects or regions. Segmentation has a wide range of applications, such as autonomous driving, robotics, augmented reality, and medical image analysis. Deep learning techniques such as convolutional neural networks (CNNs) and Transform­ers have significantly improved the accuracy of image and point cloud segmentation, while their computational complexity and requirements of a vast amount of labeled data are still bottlenecks for many real-time applications. Researchers have proposed different methods to address these limitations, while it is still a challenging issue to strike a balance between segmentation accuracy and label efficiency. In this thesis, we propose a series of approaches to improve the label and computation efficiency of model training while maintaining high segmentation accuracy.
In Chapter 1, we review some popular lable and computation-efficient methods for deep 2D/3D segmentation, and discuss contribution and organization of this thesis. In Chapter 2, we focus on transferring the model trained on synthetic source domain to real target domain. To alleviate the domain shift between source and target domains, we propose a class-balanced pixel-level self-labeling mechanism, which simultaneously clusters pixels and rectifies pseudo labels with the obtained cluster assignments. In Chapter 3, we focus on instance segmentation with box annotations as supervision. We develop a Semantic-aware Instance Mask (SIM) generation paradigm. Instead of heavily relying on local pair-wise affinities among neighboring pixels, we construct a group of category-wise feature centroids as prototypes to identify foreground objects and assign them semantic-level pseudo labels. In Chapter 4, we further improve com­putation efficiency of existing instance segmentation model. In order to alleviate the increase of computation and memory costs caused by using large masks, we develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance, achieving high efficiency while maintain­ing high segmentation accuracy. Finally, in Chapter 5, we study the application of label-efficient segmentation algorithms to open-vocabulary 3D scene understanding. We leverage large vision-language models to extract scene descriptions and category information to build the text modality as supervision. Then we co-embed different modalities into a common space for maximizing their synergistic benefits.
The proposed methods in this thesis significantly improve the label and computation efficiency of segmentation while maintaining high accuracy levels. The experimental results demonstrate their superiority to state-of-the-art segmentation methods. Our research provides a promising direction for future research in deep learning-based segmentation applications with limited annotations and computational resources.
Subjects: Computer vision
Machine learning
Image processing -- Digital techniques
Hong Kong Polytechnic University -- Dissertations
Pages: xvii, 138 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

73
Citations as of Apr 14, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.