Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/99168
Title: Transformer-based textual out-of-distribution detection : methods and analysis
Authors: Zhan, Liming
Degree: Ph.D.
Issue Date: 2023
Abstract: The success of machine learning methods heavily relies on the assumption that the test data follows a similar distribution to the training data. However, this assumption is frequently violated in real-world scenarios. Detecting distribution shifts between training and inference, referred to as out-of-distribution (OOD) detection, is crucial to prevent models from making unreliable predictions. OOD detection is particularly significant in ensuring the safe use of deep neural networks. Despite its importance and the surge of research in the vision domain, this problem is often overlooked in natural language processing (NLP).
This thesis aims to address this gap by proposing and evaluating novel transformer-based OOD detection approaches for various NLP classification tasks, such as dialogue intent detection, topic classification, sentiment classification, and question classification.
First, we present an efficient end-to-end learning framework to reduce the complexity of training textual OOD detectors. Since the distribution of OOD samples is arbitrary and unknown in the training stage, previous methods commonly rely on strong assumptions on data distribution such as mixture of Gaussians to make inference, resulting in either complex multi-step training procedures or hand-crafted rules such as confidence threshold selection for OOD detection. To develop a simplified learning paradigm for textual OOD detection, we propose to train a (K+1)-way discriminative classifier by simulating the test scenario during training. Specifically, we construct a set of pseudo OOD samples in the training stage, by generating synthetic OOD samples using in-distribution (ID) features via self-supervision and sampling OOD sentences from easily available open-domain datasets. The pseudo outliers are used to train a discriminative classifier that can be directly applied to and generalize well on the test task.
Second, we address the challenge of low-resource settings for textual OOD detection, a critical problem often encountered in the development of machine learning systems. Despite its significance, this problem has received limited attention in the literature and remains largely unexplored. We conduct a thorough investigation of this problem and identify key research issues. Through our pilot study, we uncover why existing textual OOD detection methods fall short in addressing this issue. Building on these findings, we propose a promising solution that leverages latent representation generation and self-supervision.
Finally, we delve into Transformer-based representation learning for textual OOD detection. Existing methods commonly adopt the discriminative training objective – maximizing the conditional likelihood p(y|x) – which is biased and leads to suboptimal OOD detection performance. To address this issue, we propose a generative training framework based on variational inference, which directly optimizes the likelihood of the joint distribution p(x, y). Specifically, our framework takes into account the unique characteristics of textual data and leverages the representations of pre-trained Transformers in an efficient manner.
In summary, this thesis provides novel and effective Transformer-based approaches to address the challenges of textual OOD detection. Our proposed methods show significant improvements over existing state-of-the-art methods, and our findings can have practical applications in improving the robustness of machine learning models in NLP.
Subjects: Machine learning
Text processing (Computer science)
Natural language processing (Computer science)
Hong Kong Polytechnic University -- Dissertations
Pages: xvii, 121 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

154
Last Week
7
Last month
Citations as of Nov 30, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.