Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/94890
Title: Linear discriminant analysis with high dimensional mixed variables
Authors: Yang, Zhongqing
Degree: Ph.D.
Issue Date: 2022
Abstract: With the rapid development of modern measurement technologies, datasets containing both discrete and continuous variables are more and more commonly seen in different areas. In particular, the dimensions of the discrete and continuous variables can oftentimes be very high. Discriminant analysis for mixed variables under the traditional fixed dimension setting has been well studied. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this thesis develops a novel approach for classifying high-dimensional observations with mixed variables. So in this thesis, we aim to develop a simple yet useful classification rule that addresses both the high dimensionality and the mixing structure of the variables simultaneously.
In Chapter 2-3 we introduce our framework building on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing. And provide new perspectives for its bandwidth choice to ensure an analogue of Bochner's Lemma, which is different to the usual bias-variance tradeoff. We show that the two sets of parameters in our model can be separately estimated and provide a penalized likelihood method for their estimation.
In Chapter 4, some theoretical results are shown. Efficient direct estimation schemes are developed to obtain consistent estimators of the discriminant components.
In Chapter 5, we conduct simulation studies to investigate the performance of proposed semiparametric location model. Results on the estimation accuracy and the misclassification rates are established, and the competitive performance of the proposed classifier is illustrated by extensive simulation and real data studies.
Subjects: Variables (Mathematics)
Dimensional analysis
Mathematical models
Hong Kong Polytechnic University -- Dissertations
Pages: xviii, 78 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

182
Last Week
4
Last month
Citations as of Apr 12, 2026

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.