Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/84067
Title: Mining clusters in attributed graphs
Authors: He, Tiantian
Degree: Ph.D.
Issue Date: 2017
Abstract: Many real-world relational data can be modeled as graphs that contain vertices and edges representing, respectively, data entities and their relationship. One of the most important tasks is to discover graph clusters or communities, which are interesting subgraphs in the graph data. To find such clusters in graph data, many computational methods have been proposed. Most of the prevalent approaches discover graph clusters taking into the consideration either different topological properties of the graph, e.g., density, and modularity, or vertex attributes. However, effective computational approaches for discovering clusters in graphs, which consider both topology and attribute as factors are not many. In this thesis, we propose to discover graph clusters using the Attributed Graph, which contains a set of vertices, edges, and attributes that are associated with vertices. Combining the edge structure with the attribute, it is possible for a computational method to discover clusters in the attributed graph, taking into the consideration edge structure and attributes. Based on the Attributed Graph, we propose four different algorithms. Each of these four algorithms has their unique characteristics and may address the existing challenges in graph clustering. To discover interesting subgraphs in which vertices are inter-related, we propose an algorithm for identifying interesting sub-graphs making use of both edge structure and the degree of attribute association between pairwise vertices (MISAGA). MISAGA formulates the task of discovering k sub-graphs as a constrained optimization problem and solves it by identifying the optimal affiliation of sub-graphs for the vertices through an iterative updating algorithm. In each of the interesting sub-graphs found by MISAGA, vertices are densely connected and their attribute values are significantly associated, although their attribute values might not be the same. As there are no very effective graph clustering algorithms that are based on fuzzy set theory, we propose an algorithm for discovering fuzzy structural patterns in attributed graphs (FSPGA). FSPGA adopts an effective fuzzy clustering framework to allow overlapping clusters to be identified. As the identified clusters in some real applications, e.g., functional modules in biological graphs, need to be connected components, we further propose two more algorithms, called EGCPI and TBPCI for identifying clusters of interest. Different from other approaches, EGCPI formulates the task of discovering clusters in the attributed graph as an optimization problem and tackles it with evolutionary clustering. It can identify those sub-graphs in which vertices are densely connected as well as their attributes are more similar. TBPCI identifies clusters utilizing local information of vertex connectedness and the attribute association between pairwise vertices in attributed graph. TBPCI may compute the optimal degree of boundedness between each pair of vertices which may capture how strong the vertices can be considered as bounded together. Then the clusters can be identified by grouping those vertices sharing degrees of boundedness which are sufficiently strong. The proposed algorithms have been used in different real-world applications, including community detection in social network graphs and functional modules identification in biological network graphs. The experimental results show these proposed algorithms outperform state-of-the-art approaches.
Subjects: Hong Kong Polytechnic University -- Dissertations
Data mining
Graph theory -- Data processing
Graph algorithms
Pages: xviii, 146 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

50
Last Week
1
Last month
Citations as of Apr 14, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.