Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/17762
Title: Contrastive approach towards text source classification based on top-bag-of-word similarity
Authors: Huang, CR 
Lee, LH
Keywords: Chinese gigaword
Comparable corpus
Contrastive approach
Text source classification
Top-bag-of-word similarity
Issue Date: 2008
Source: Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22, 2008, p. 404-410 How to cite?
Abstract: This paper proposes a method to automatically classify texts from different varieties of the same language. We show that similarity measure is a robust tool for studying comparable corpora of language variations. We take LDC's Chinese Gigaword Corpus composed of three varieties of Chinese from Mainland China, Singapore, and Taiwan, as the comparable corpora. Top-bag-of-word similarity measures reflect distances among the three varieties of the same language. A Top-bag-of-word similarity based contrastive approach was taken to solve the text source classification problem. Our results show that a contrastive approach using similarity to rule out identity of source and to arrive actual source by inference is more robust that directly confirmation of source by similarity. We show that this approach is robust when applied to other texts.
Description: 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22, Cebu, 20-22 November 2008
URI: http://hdl.handle.net/10397/17762
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

4
Last Week
0
Last month
Citations as of Dec 6, 2017

WEB OF SCIENCETM
Citations

5
Last Week
0
Last month
0
Citations as of Oct 1, 2017

Page view(s)

54
Last Week
4
Last month
Checked on Dec 11, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.