Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/24987
Title: Measuring termhood in automatic terminology extraction
Authors: Zhang, Q
Lu, Q 
Sui, Z
Keywords: Information retrieval
Natural languages
Statistical analysis
Support vector machines
Issue Date: 2007
Publisher: IEEE
Source: International Conference on Natural Language Processing and Knowledge Engineering, 2007 : NLP-KE 2007, August 30 2007-September 1 2007, Beijing, p. 328-335 How to cite?
Abstract: Automatic terminology extraction can be divided into two tasks. The first task measures the unithood which is used to identify a string as a lexical unit. The second task measures the so called termhood, used to identify a lexical unit being a domain specific term. This paper proposes a method to measure termhood in Chinese ATE. It considers the domain specificity of both the components of a candidate term as well as statistical information and other contextual information across different domains and applied to a support vector machine model for terminology extraction. The experiments are based on the Chinese corpus in the IT domain with cross validation of data from outside of the IT domain. Results show that the precision of the open tests can reach over 80% for the top 2,000 candidates and around 50% for the top 20,000 candidate. Furthermore, experiments with different lexicon size shows that the algorithm does not require a comprehensive domain lexicon of a large size. A few thousand basic domain terms would be sufficient to achieve the above mentioned performance.
URI: http://hdl.handle.net/10397/24987
ISBN: 978-1-4244-1610-3
978-1-4244-1611-0 (E-ISBN)
DOI: 10.1109/NLPKE.2007.4368051
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

5
Last Week
1
Last month
Citations as of Sep 17, 2017

Page view(s)

33
Last Week
1
Last month
Checked on Sep 18, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.