Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/21231
Title: Discovering "title-like" terms
Authors: Wong, CWY
Luk, RWP 
Ho, EKS
Keywords: Classification
Induction
Term extraction
Issue Date: 2005
Publisher: Pergamon Press
Source: Information processing and management, 2005, v. 41, no. 4, p. 789-800 How to cite?
Journal: Information processing and management 
Abstract: This paper examines the feasibility of discovering "title-like" terms using a decision tree classifier from the document. The premise of discovering title-like terms is that title terms and title-like terms should behave similarly in the document. This behavior is characterized by a set of distributional and linguistic features. By training the classifier to observe the behavior of title terms in a balanced manner using 25,000 titles in Reuters articles, other terms with similar behavior would also be discovered. Based on 5000 unseen titles, the recall of title terms was 83%, similar to the manual identification of title terms. The precision of finding title terms is low (i.e., 32%) because some non-title but title-like terms should have been identified as well. Seven subjects were asked to rate, on a scale of between 1 and 5, whether the identified term is a topical/thematic/title term. If a rating of 2.5 is used to determine whether a term is judged to be a "title-like" term, then the mean precision is increased to 58%, or the headline/title is expanded with twice the average number of terms. Since this precision (i.e., 58%) is similar to the mean precision of manually identified title terms averaged across different subjects, we conclude that the discovery of title-like terms using classifiers is a promising approach.
URI: http://hdl.handle.net/10397/21231
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2004.05.007
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

3
Last Week
0
Last month
0
Citations as of Nov 9, 2017

WEB OF SCIENCETM
Citations

2
Last Week
0
Last month
0
Citations as of Nov 17, 2017

Page view(s)

49
Last Week
1
Last month
Checked on Nov 19, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.