Please use this identifier to cite or link to this item:
Title: Distributional similarity model for multi-modality clustering in social media
Authors: Sze, CM
Fu, TC
Chung, FL 
Luk, R 
Keywords: Social Media AnalysisMulti-Modality ClusteringDistributional Features
Issue Date: 2007
Publisher: IEEE
Source: 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, 5-12 November 2007, Silicon Valley, CA, p. 268-271 How to cite?
Abstract: User generated content (UGC) has become the fastest growing sector of the WWW. Data mining from UGC presents challenges not typically found in text mining from documents. UGC can be semi-structured and its content can be very short and informal, containing relatively little content similar to a chat or an email conversation. In addition UGC can be viewed as a multi-modality data. These characteristics pose big challenges and research questions for scholars to cope with. To cluster UGC data, we can construct multiple contingency tables of modalities and employ the multi-way distributional clustering (MDC) algorithm. However, by considering a contingency table which summarizes the co-occurrence statistics of two modalities, it is not robust to represent the information entropy between two modalities in UGC data. In this paper, we propose a novel similarity measurement, called distributional similarity model (DSM), to solidify the graph model in the MDC algorithm to deal with the unique characteristics of the UGC data.
ISBN: 0-7695-3028-1
DOI: 10.1109/WI-IATW.2007.105
Appears in Collections:Conference Paper

View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

Last Week
Last month
Citations as of Aug 13, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.