Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/15855
Title: Fuzzy clustering and relevance ranking of web search results with differentiating cluster label generation
Authors: Matsumoto, T
Hung, E
Keywords: Internet
Fuzzy set theory
Pattern clustering
Public domain software
Search engines
Trees (mathematics)
Issue Date: 2010
Publisher: IEEE
Source: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), 18-23 July 2010, Barcelona, p. 1-8 How to cite?
Abstract: This paper introduces a prototype web search results clustering engine that enhances search results by performing fuzzy clustering on web documents returned by conventional search engines, as well as ranking the results and labeling the resulting clusters. This is done using a fuzzy transduction-based clustering algorithm (FTCA), which employs a transduction-based relevance model (TRM) to generate document relevance values. These relevance values are used to cluster similar documents, rank them, and facilitate a term frequency based label generator. The membership degrees of documents to fuzzy clusters also facilitates effective detection and removal of overly similar clusters. FTCA is compared against two other established web document clustering algorithms: Suffix Tree Clustering (STC) and Lingo, which are provided by the free open source Carrot2 Document Clustering Workbench. To measure cluster quality, an extended version of the classic precision measurement is used to take into account relevance and fuzzy clustering, along with recall and F1 score. Results from testing on five different datasets show a considerable clustering quality and performance advantage over STC and Lingo in most cases.
URI: http://hdl.handle.net/10397/15855
ISBN: 978-1-4244-6919-2
ISSN: 1098-7584
DOI: 10.1109/FUZZY.2010.5584771
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

31
Last Week
1
Last month
Checked on Aug 20, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.