Label and computation-efficient deep segmentation for images and point clouds

Li, Ruihuang

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109469

Title:	Label and computation-efficient deep segmentation for images and point clouds
Authors:	Li, Ruihuang
Degree:	Ph.D.
Issue Date:	2024
Abstract:	Segmentation is an essential task in computer vision, which aims to divide an image or point cloud into several disjoint sets of pixels or points that correspond to different objects or regions. Segmentation has a wide range of applications, such as autonomous driving, robotics, augmented reality, and medical image analysis. Deep learning techniques such as convolutional neural networks (CNNs) and Transformers have signiﬁcantly improved the accuracy of image and point cloud segmentation, while their computational complexity and requirements of a vast amount of labeled data are still bottlenecks for many real-time applications. Researchers have proposed different methods to address these limitations, while it is still a challenging issue to strike a balance between segmentation accuracy and label efficiency. In this thesis, we propose a series of approaches to improve the label and computation efficiency of model training while maintaining high segmentation accuracy. In Chapter 1, we review some popular lable and computation-efficient methods for deep 2D/3D segmentation, and discuss contribution and organization of this thesis. In Chapter 2, we focus on transferring the model trained on synthetic source domain to real target domain. To alleviate the domain shift between source and target domains, we propose a class-balanced pixel-level self-labeling mechanism, which simultaneously clusters pixels and rectiﬁes pseudo labels with the obtained cluster assignments. In Chapter 3, we focus on instance segmentation with box annotations as supervision. We develop a Semantic-aware Instance Mask (SIM) generation paradigm. Instead of heavily relying on local pair-wise affinities among neighboring pixels, we construct a group of category-wise feature centroids as prototypes to identify foreground objects and assign them semantic-level pseudo labels. In Chapter 4, we further improve computation efficiency of existing instance segmentation model. In order to alleviate the increase of computation and memory costs caused by using large masks, we develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance, achieving high efficiency while maintaining high segmentation accuracy. Finally, in Chapter 5, we study the application of label-efficient segmentation algorithms to open-vocabulary 3D scene understanding. We leverage large vision-language models to extract scene descriptions and category information to build the text modality as supervision. Then we co-embed different modalities into a common space for maximizing their synergistic beneﬁts. The proposed methods in this thesis signiﬁcantly improve the label and computation efficiency of segmentation while maintaining high accuracy levels. The experimental results demonstrate their superiority to state-of-the-art segmentation methods. Our research provides a promising direction for future research in deep learning-based segmentation applications with limited annotations and computational resources.
Subjects:	Computer vision Machine learning Image processing -- Digital techniques Hong Kong Polytechnic University -- Dissertations
Pages:	xvii, 138 pages : color illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/13202

Show full item record

Page views

73

Citations as of Apr 14, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM