Similarity measures : algorithms and applications

Chan, Tsz Nam

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85138

Title:	Similarity measures : algorithms and applications
Authors:	Chan, Tsz Nam
Degree:	Ph.D.
Issue Date:	2019
Abstract:	Similarity measures are the basic components for various problems such as image processing, computer vision, pattern recognition and machine learning problems. However, evaluating the similarity measures is normally the bottleneck for many applications. In this thesis, we highlight three computational intensive applications and propose efficient algorithms in these scenarios. The first application is object detection in images. Given a query image, this problem finds the most similar sub-image within a given target image. The problem can be formulated as the nearest neighbor search problem. In the context of computer vision, we also call this the template matching problem. The Euclidean distance is used to measure the dissimilarity between the query image and a sub-image. However, the time complexity of object detection for each query is the product of the sizes of sub-image and image, which is prohibited for fast object detection scenario. We propose two solutions which can significantly outperform the state-of-the-art method by 9-20 times faster. The second application is image retrieval. Existing image retrieval systems extract the feature histograms for all images. During the online phase, image retrieval systems return the k most similar images for each online image-query from the user. One robust similarity measure between two histograms is based on the Earth Mover's Distance (EMD). However, due to the cubic time complexity for evaluating EMD, it restricts the applicability to small-scale datasets. We present the approximation framework that leverages on lower and upper bound functions to compute approximate EMD with error guarantee. Under this framework, we present two solutions which can significantly outperform the existing exact or heuristic solutions. Our experimental studies demonstrate that our best solution can outperform the existing method by 2.38x to 7.26x times faster. The third application is (kernel) classification. In machine learning context, kernel function is the similarity measure between two multidimensional vectors, which are extracted by different feature extraction methods, based on different scenarios. Many machine learning models need to compute the weighted aggregation of kernel function values with respect to a set of multidimensional vectors and the query vector, using different types of kernel functions, for example: Gaussian, Polynomial or Sigmoid kernels. However, computing the online kernel aggregation function is normally expensive which limits its applicability for some real-time (e.g. network anomaly detection) or large-scale (e.g. density estimation/ classification for physical modeling) applications. We propose novel and effective bounding techniques to speed up the computation of kernel aggregation. We further boost the efficiency by leveraging index structures and exploiting index tuning opportunities. Experimental studies on many real datasets reveal that our proposed method achieves speedups of 2.5-738x over the state-of-the-art.
Subjects:	Hong Kong Polytechnic University -- Dissertations Image processing -- Digital techniques Computer algorithms Image analysis -- Data processing
Pages:	xx, 177 pages : color illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/9915

Show full item record

Page views

198

Last Week
0

Last month

Citations as of Oct 5, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM