Scale-driven clustering of geographical point data

Liu, Qiliang

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85776

Title:	Scale-driven clustering of geographical point data
Authors:	Liu, Qiliang
Degree:	Ph.D.
Issue Date:	2015
Abstract:	Clustering is a technique for classifying or grouping similar observed data into clusters or categories. It plays a key role in geographical data analysis, e.g. for investigating the distribution of geographical data and observing the characteristics of clusters. Clustering sometimes also serves as an important pre-process for other data analysis techniques. A number of methods have been developed for clustering of geographical; however, two limitations still exist. First, although many researchers have realized that clusters discovered from a geographical dataset are scale-dependent, most existing method only simply confirm whether or not a set of geographical data is a cluster, but not able to detect multi-scale clusters. Second, user-specified threshold is usually used to determine whether a set of geographical data can be identified as a cluster, thus the significance of the discovered clusters discovered cannot be evaluated in an objective way. Therefore, this study aims to tackle these two problems by mimicking the human perception of grouping at multi-scales. A scale-driven strategy is proposed to detect multi-scale statistically significant clusters from the most popular type of geographical data, i.e. geographical point data. Specifically, scale in clustering is first defined by data (sampling) scale and analysis (model) scale. Then, hypothesis testing is developed to construct the relationship between these two kinds of scales, and further used to evaluate the significance of the clusters discovered at continuous analysis scales. Finally, scale is explicitly modeled as parameter of clustering model. Based on the proposed strategy, a specific scale-driven clustering model is developed for each of the four popular forms of geographical point data, i.e., spatial points, spatio-temporal events, spatial points with attributes and spatio-temporal variables. A scale-driven clustering model is developed for the discovery of density-based clusters from spatial points. A statistical method based on the Delaunay triangulation network is developed to achieve adaptive selection of analysis scale. A method based on the Natural Principle and an iterative detection and removal method are proposed to control the data scale. Experiments on both simulated and real-life datasets show that, compared with existing algorithms, only the proposed model is able to detect multi-scale clusters that are consistent with human perception. In the detection of density-based dynamic clusters from spatio-temporal events, a scale-driven clustering model is proposed. A method based on spatio-temporal classification entropy and stability analysis is developed to identify the optimal analysis scale. Experiments on both simulated and earthquake datasets show that, compared with existing algorithms, the proposed model is not only able to correctly discover clusters with different shapes and densities but also able to reduce the subjectivity for determining user-specified parameters to a minimum. A scale-driven clustering model for detecting connectivity-based clusters formed by spatially contiguous objects with similar attribute values is developed. Clusters at continuous analysis scales are discovered by minimizing the reduction in the degree of homogeneity within clusters. A permutation test is proposed to identify significant clusters obtained at continuous analysis scales. Experiments on both simulated and meteorological datasets show that, the proposed model is not only able to correctly discover clusters consistent with human perception but also able to overcome an inherently difficult problem of exiting hierarchical clustering algorithms, i.e. lack of proper definition of stopping criterion. To detect clusters formed by neighbouring spatio-temporal variables with similar attribute values, a scale-driven clustering model is constructed. A fast permutation test with the help of topological relationship is proposed to identify significant clusters discovered at continuous analysis scales. Experiments on both simulated and temperature datasets show that, compared with existing algorithm, the proposed model is more efficient and effective for detecting significant clusters at continuous analysis scales. In summary, this study aims to detect significant clusters at multiple scales for different applications. To achieve this, a scale-driven strategy is proposed and scale is explicitly represented in the parameterization of clustering models. Experimental results show that multi-scale significant clusters with different sizes, shapes and densities can be easily discovered by controlling the scale parameters, and the subjectivity in clustering of geographical data is significantly reduced.
Subjects:	Cluster analysis. Spatial analysis (Statistics) Geography -- Statistical methods. Hong Kong Polytechnic University -- Dissertations
Pages:	xv, 194 pages : illustrations (some color) ; 30 cm
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/7953

Show full item record

Page views

151

Last Week
0

Last month

Citations as of May 11, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM