Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/24190
Title: Using complex linguistic features in context-sensitive text classification techniques
Authors: Wong, AKS
Lee, JWT
Yeung, DS
Keywords: Classification
Computational linguistics
Context-sensitive languages
Text analysis
Issue Date: 2005
Publisher: IEEE
Source: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005, 18-21 August 2005, Guangzhou, China, v. 5, p. 3183-3188 How to cite?
Abstract: Text classification (TC) is the task to automatically classify documents based on learned document features. Many popular TC models use simple occurrence of words in a document as features. They also commonly assume word occurrences to be statistically independent in their design. Although it is obvious that such assumption does not hold in general, these TC models have been robust and efficient in their task. Some recent studies have shown context-sensitive TC approaches, which take into consideration contexts in the form of word co-occurrences, have been able to perform better in general. On the other hand, there have been many studies in the use of complex linguistic or semantic features instead of simple word occurrences as features for information retrieval and classification tasks. While these complex features may intuitively have more relevance to the tasks concerned, results of these studies on their effectiveness have been mixed and not been conclusive. In this paper we present our investigation on the use of some complex linguistic features with context-sensitive TC method. Our experiment results show some potential advantages of such approach.
URI: http://hdl.handle.net/10397/24190
ISBN: 0-7803-9091-1
DOI: 10.1109/ICMLC.2005.1527491
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

46
Last Week
1
Last month
Checked on Nov 20, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.