Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/16597
Title: On the use of hierarchical information in sequential mining-based XML document similarity computation
Authors: Leung, HP
Chung, FL 
Chan, SCF 
Keywords: Information retrieval
Sequential mining
XML structural similarity
Issue Date: 2005
Publisher: Springer
Source: Knowledge and information systems, 2005, v. 7, no. 4, p. 476-498 How to cite?
Journal: Knowledge and information systems 
Abstract: Measuring the structural similarity among XML documents is the task of finding their semantic correspondence and is fundamental to many web-based applications. While there exist several methods to address the problem, the data mining approach seems to be a novel, interesting and promising one. It explores the idea of extracting paths from XML documents, encoding them as sequences and finding the maximal frequent sequences using the sequential pattern mining algorithms. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, a new sequential pattern mining scheme for XML document similarity computation is proposed in this paper. It makes use of a preorder tree representation (PTR) to encode the XML tree's paths so that both the semantics of the elements and the hierarchical structure of the document can be taken into account when computing the structural similarity among documents. In addition, it proposes a postprocessing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported. Springer-Verlag London Ltd.
URI: http://hdl.handle.net/10397/16597
ISSN: 0219-1377
EISSN: 0219-3116
DOI: 10.1007/s10115-004-0156-7
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

28
Last Week
0
Last month
0
Citations as of Sep 11, 2017

WEB OF SCIENCETM
Citations

15
Last Week
0
Last month
0
Citations as of Sep 6, 2017

Page view(s)

48
Last Week
1
Last month
Checked on Sep 17, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.