Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/91699
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Computing | - |
dc.creator | Li, Lida | - |
dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/11359 | - |
dc.language.iso | English | - |
dc.title | Towards reliable CNN architecture design for visual recognition | - |
dc.type | Thesis | - |
dcterms.abstract | While several popular network architectures have been developed and widely used, it remains an important topic to design effective and efficient convolutional neural network (CNN) architectures for visual recognition. The design of reliable CNN architectures faces three main challenges, including how to reduce the computational cost, how to improve the accuracy, and how to enhance the robustness against adversarial attacks. In this thesis, we study the design of reliable CNN architectures for visual recognition. In Chapter 1, we review some common CNN architectures and their design methods for visual recognition, and discuss contribution and organization of this thesis. In Chapter 2, we present a detachable second-order pooling network to improve the performance of first-order CNNs in image classification while keeping the same computational cost at testing stage. In Chapter 3, we propose to train deep CNNs with a learnable sparse transform (LST), which learns to convert the input features into a more compact and sparser domain together with the CNN training process. The proposed LST is more effective in reducing the spatial and channel-wise feature redundancies than the conventional Conv2d, and it can be efficiently implemented with existing CNN modules for seamless training and inference. We also present a hybrid LST-ReLU activation to enhance the robustness of the learned CNN models. In Chapter 4, we further improve LST to faithfully build CNNs for visual recognition. The proposed LST v2 employs hierarchical depth-wise separable convolution to allow incomplete yet flexible expansion. LST v2 can achieve comparable or even higher accuracy than LST-Net in a wide range of visual recognition tasks. Finally, in Chapter 5, we study the application of LST to adversarial attacks. A robust convolutional layer with multiple kernels, namely RConv-MK, is proposed to improve the robustness of LST against various types of image corruptions and manually designed adversarial attacks. In summary, in this thesis we present four reliable CNN architecture design methods, including a detachable second-order pooling network, a learnable sparse transform and its improved version, and a robust convolutional layer. Extensive experiments demonstrate their effectiveness and efficiency for accurate, lightweight and robust visual recognition. | - |
dcterms.accessRights | open access | - |
dcterms.educationLevel | Ph.D. | - |
dcterms.extent | xx, 191 pages : color illustrations | - |
dcterms.issued | 2021 | - |
dcterms.LCSH | Neural networks (Computer science) | - |
dcterms.LCSH | Image processing -- Digital techniques | - |
dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | - |
Appears in Collections: | Thesis |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/11359
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.