Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/23713
Title: An efficient representation model of distance distribution between uncertain objects
Authors: Hung, E
Xiao, L
Hung, RYS
Keywords: data mining
distance representation model
uncertain database
Issue Date: 2012
Publisher: Wiley-Blackwell
Source: Computational intelligence, 2012, v. 28, no. 3, p. 373-397 How to cite?
Journal: Computational Intelligence 
Abstract: In this paper, we consider the problem of efficient computation of distance between uncertain objects. In many real life applications, data like sensor readings and weather forecasts are usually uncertain when they are collected or produced. An uncertain object has a probability distribution function (PDF) to represent the probability that it is actually located in a particular location. A fast and accurate distance computation between uncertain objects is important to many uncertain query evaluation (e.g., range queries and nearest-neighbor queries) and uncertain data mining tasks (e.g., classifications, clustering, and outlier detection). However, existing approaches involve distance computations between samples of two objects, which is very computationally intensive. On one hand, it is expensive to calculate and store the actual distribution of the possible distance values between two uncertain objects. On the other hand, the expected distance (the weighted average of the pairwise distances among samples of two uncertain objects) provides very limited information and also restricts the definitions and usefulness of queries and mining tasks. In this paper, we propose several approaches to calculate the mean of the actual distance distribution and approximate its variance. Based on these, we suggest that the actual distance distribution could be approximated using a standard distribution like Gaussian or Gamma distribution. Experiments on real data and synthetic data show that our approach produces an approximation in a very short time with acceptable accuracy (about 90%). We suggest that it is practical for the research communities to define and develop more powerful queries and data mining tasks based on the distance distribution instead of the expected distance.
URI: http://hdl.handle.net/10397/23713
DOI: 10.1111/j.1467-8640.2012.00440.x
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

4
Last Week
0
Last month
0
Citations as of May 18, 2017

WEB OF SCIENCETM
Citations

2
Last Week
0
Last month
0
Citations as of May 20, 2017

Page view(s)

30
Last Week
0
Last month
Checked on May 21, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.