Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/30439
Title: Mining outliers with faster cutoff update and space utilization
Authors: Szeto, CC
Hung, E
Keywords: Disk-based algorithms
Distance-based outliers
Memory optimization
Outlier detection
Issue Date: 2010
Publisher: North-Holland
Source: Pattern recognition letters, 2010, v. 31, no. 11, p. 1292-1301 How to cite?
Journal: Pattern recognition letters 
Abstract: It is desirable to find unusual data objects by Ramaswamy et al.'s distance-based outlier definition, because only a metric distance function between two objects is required. This definition does not need any neighborhood distance threshold required by many existing algorithms based on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques, respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS, and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4-2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size.
URI: http://hdl.handle.net/10397/30439
ISSN: 0167-8655
EISSN: 1872-7344
DOI: 10.1016/j.patrec.2010.04.002
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

4
Last Week
0
Last month
0
Citations as of Oct 22, 2017

WEB OF SCIENCETM
Citations

2
Last Week
0
Last month
0
Citations as of Oct 24, 2017

Page view(s)

40
Last Week
1
Last month
Checked on Oct 23, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.