Towards reliable CNN architecture design for visual recognition

Li, Lida

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/91699

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	Li, Lida	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/11359	-
dc.language.iso	English	-
dc.title	Towards reliable CNN architecture design for visual recognition	-
dc.type	Thesis	-
dcterms.abstract	While several popular network architectures have been developed and widely used, it remains an important topic to design effective and efficient convolutional neural network (CNN) architectures for visual recognition. The design of reliable CNN architectures faces three main challenges, including how to reduce the computational cost, how to improve the accuracy, and how to enhance the robustness against adversarial attacks. In this thesis, we study the design of reliable CNN architectures for visual recognition. In Chapter 1, we review some common CNN architectures and their design methods for visual recognition, and discuss contribution and organization of this thesis. In Chapter 2, we present a detachable second-order pooling network to improve the performance of first-order CNNs in image classification while keeping the same computational cost at testing stage. In Chapter 3, we propose to train deep CNNs with a learnable sparse transform (LST), which learns to convert the input features into a more compact and sparser domain together with the CNN training process. The proposed LST is more effective in reducing the spatial and channel-wise feature redundancies than the conventional Conv2d, and it can be efficiently implemented with existing CNN modules for seamless training and inference. We also present a hybrid LST-ReLU activation to enhance the robustness of the learned CNN models. In Chapter 4, we further improve LST to faithfully build CNNs for visual recognition. The proposed LST v2 employs hierarchical depth-wise separable convolution to allow incomplete yet flexible expansion. LST v2 can achieve comparable or even higher accuracy than LST-Net in a wide range of visual recognition tasks. Finally, in Chapter 5, we study the application of LST to adversarial attacks. A robust convolutional layer with multiple kernels, namely RConv-MK, is proposed to improve the robustness of LST against various types of image corruptions and manually designed adversarial attacks. In summary, in this thesis we present four reliable CNN architecture design methods, including a detachable second-order pooling network, a learnable sparse transform and its improved version, and a robust convolutional layer. Extensive experiments demonstrate their effectiveness and efficiency for accurate, lightweight and robust visual recognition.	-
dcterms.accessRights	open access	-
dcterms.educationLevel	Ph.D.	-
dcterms.extent	xx, 191 pages : color illustrations	-
dcterms.issued	2021	-
dcterms.LCSH	Neural networks (Computer science)	-
dcterms.LCSH	Image processing -- Digital techniques	-
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	-
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/11359

Show simple item record

Page views

137

Last Week
0

Last month

Citations as of Jun 22, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM