Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/43717
Title: Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data
Authors: Xu, J
Xu, Q
Yi, L
Chan, CO
Mok, DKW 
Keywords: Classification
Principal component analysis
Soft independent modeling of class analogy
Issue Date: 2016
Publisher: John Wiley & Sons
Source: Journal of chemometrics, 2016, v. 30, no. 1, p. 37-45 How to cite?
Journal: Journal of chemometrics 
Abstract: High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large-p-small-n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation-assisted nearest shrunken centroid classifier (CA-NSC) to incorporate correlation information into the classification. CA-NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA-NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA-NSC consistently improves on NSC and SIMCA. The misclassification rate of CA-NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification.
URI: http://hdl.handle.net/10397/43717
ISSN: 0886-9383
DOI: 10.1002/cem.2768
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

30
Last Week
1
Last month
Checked on Aug 21, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.