Effective and efficient optimization methods for deep learning

Yong, Hongwei

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/96342

Title:	Effective and efficient optimization methods for deep learning
Authors:	Yong, Hongwei
Degree:	Ph.D.
Issue Date:	2022
Abstract:	Optimization techniques play an essential role in deep learning, and a favorable optimization approach can greatly boost the final performance of the trained deep neural network (DNN). Generally speaking, there are two major goals for a good DNN optimizer: accelerating the training process and improving the model generalization capability. In this thesis, we study the effective and efficient optimization techniques for deep learning. Batch normalization (BN) is a key technique for stable and effective DNN training. It can simultaneously improve the model training speed and the model generalization performance. However, it is well-known that the training and inference stages of BN have certain inconsistency, and the performance of BN will drop largely when the training batch size is small. In Chapter 2, we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process. We then propose a Momentum Batch Normalization (MBN) method to control the noise level and improve the training with BN. Meanwhile, in Chapter 3, we put forward an effective inference method of BN, i.e, Batch Statistics Regression (BSR), which uses the instance statistics to predict the batch statistics with a simple linear regression model. BSR can more accurately estimate the batch statistics, making the training and inference of BN much more consistent. We evaluate them on CIFAR100/CIFAR100, Mini-ImageNet, ImageNet, etc. Gradient descent is dominantly used to update DNN models for its simplicity and efficiency to handle large-scale data. In Chapter 4, we present a simple yet effective DNN optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. In Chapter 5, we present a feature stochastic gradient descent (FSGD) method to approximate the desired feature outputs with one-step gradient descent. FSGD improves the singularity of feature space and thus enhances feature learning efficacy. Finally, in Chapter 6 we propose a novel optimization approach, namely Embedded Feature Whitening (EFW), which overcomes the several drawbacks of conventional feature whitening methods while inheriting their advantages. EFW only adjusts the gradient of weight by using the whitening matrix without changing any part of the network so that it can be easily adopted to optimize pre-trained and well-defined DNN architectures. We testify them on various tasks, including image classification on CIFAR100/CIFAR100, ImageNet, fine-grained image classification datasets, and object detection and instance segmentation on COCO, and them achieve obvious performance gains. In summary, in this thesis, we present five deep learning optimization methods. Among them, MBN and BSR improve the BN training and inference, respectively; GC adjusts the gradient of weight with a centralization operation; FSGD provides a practical approach to perform feature-driven gradient descent; and EFW embeds the existing feature whitening into the optimization algorithms for effective deep learning. Extensive experiments demonstrate their effectiveness and efficiency for DNN optimization.
Subjects:	Machine learning Hong Kong Polytechnic University -- Dissertations
Pages:	xxi, 159 pages : color illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/12027

Show full item record

Page views

78

Last Week
0

Last month

Citations as of Sep 22, 2024

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM