Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/106283
| Title: | Few-shot intent detection with pre-trained language models: transferability, expressiveness and efficiency | Authors: | Zhang, Haode | Degree: | Ph.D. | Issue Date: | 2024 | Abstract: | The identification of user intents is a fundamental component of a task-oriented dialogue system, with the aim of detecting the intent underlying a user’s utterance, according to which an appropriate response is provided. Typically, intent detection is formulated into a text classification task, which has benefited from the success of deep learning techniques. However, the acquisition of a large number of annotations for training is expensive. This thesis addresses the challenge of few-shot intent detection, whereby the goal is to develop a highly effective intent classifier using only a limited amount of annotated data, thereby improving data efficiency. We first study the cross-domain transferability for few-shot intent detection, exploring the possibility of jointly utilizing abundant labeled data in a source domain and easily available unlabeled data in a target domain to train an intent classifier with reasonable performance. We investigate techniques of transfer learning across domains and adapting to a new domain. Leveraging the data in public intent detection datasets, we train IntentBERT, the backbone that transfers knowledge from diverse multiple intent detection domains, significantly improving the performance in the target domain. With easily available unlabeled data in the target domain, the performance is further enhanced. Next, to improve the expressiveness of IntentBERT, the study focuses on a particular property of the pre-trained language models (PLMs) – anisotropy, an undesirable geometric property of the feature space. We discover that supervised pre-training yields an anisotropic feature space, which may suppress the expressive power of the semantic representations. To mitigate the problem, we propose to enhance supervised pre-training by regularizing the feature space towards isotropy. We propose two regularizers based on contrastive learning and correlation matrix respectively, and demonstrate their effectiveness through extensive experiments. Through the joint supervised pre-training and isotropization, we achieve improved performance in few-shot intent detection. Then, to further improve the data efficiency, we revisit the overfitting phenomenon, continual pre-training, and direct fine-tuning based on PLMs in the context of few-shot intent detection. Although the prevailing approach to few-shot intent detection is continual pre-training, i.e., fine-tuning PLMs on external resources, our study demonstrates that continual pre-training may not be necessary. Specifically, we find that the overfitting issue of PLMs may not be as severe as previously believed, i.e. directly fine-tuning PLMs with only a handful of labeled examples already yields decent results, and the performance gap quickly shrinks as the number of labeled data grows. We further enhance the performance of direct fine-tuning with context augmentation and sequential self-distillation. Comprehensive experiments on real-world benchmarks show that given only two or more labeled samples per class, the enhanced direct fine-tuning outperforms many strong baselines that utilize external data sources for continual pre-training. Finally, to enhance the computational efficiency, we study model compression for intent detection with limited labeled data. Traditional approaches to model compression, such as model pruning and distillation, typically rely on access to large amounts of data. However, such datasets are not readily available under the few-shot scenario. To overcome this challenge, we propose a scheme that capitalizes on of-the-shelf generative PLMs for data augmentation. Furthermore, we introduce a vocabulary pruning technique employing a nearest neighbour matching scheme. Through extensive experiments, we demonstrate the efficacy of the proposed method – we can compress the model by a factor of 21, and thus enable the deployment of the model in resource-constrained scenarios, including mobile devices and embedded systems. |
Subjects: | Dialogue analysis Machine learning Natural language processing (Computer science) Hong Kong Polytechnic University -- Dissertations |
Pages: | xiv, 95 pages : color illustrations |
| Appears in Collections: | Thesis |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/12927
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


