Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85141
DC FieldValueLanguage
dc.contributorDepartment of Computing-
dc.creatorXu, Jian-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/7776-
dc.language.isoEnglish-
dc.titleNamed entity disambiguation from web text-
dc.typeThesis-
dcterms.abstractNamed entity disambiguation is the problem of grouping name mentions into clusters, with each cluster referring to the same underlying entity. In this thesis, we focus on named entity disambiguation from web text, because finding information about person on the Internet is one of the most common activities of online users. Person{174}s names, however, are highly ambiguous with a large number of people sharing the same name. Named entity disambiguation therefore becomes increasingly important for many applications such as information retrieval, question answering, cross-document co-reference, relation discovery and so on. This leads to our study of named entity disambiguation over the Internet. In general, named entity disambiguation for web text includes two tasks: (1) Web Person Disambiguation (WPD), which groups search results into different clusters with each cluster referring to the same person; and (2) personal profile extraction (PPE), which can help build each person{174}s relational information in the cluster. The main challenges in named entity disambiguation include (1) how to select meaningful features that are unique to identify named entities; (2) how to guarantee high performance in WPD, even if there is no prior knowledge of the number of persons having the same name; (3) how to obtain and select quality training data from an external knowledge base for personal profile extraction (PPE), since manually annotated data is costly to yield and limited in scale. In this thesis, we explore the use of more semantically relevant information for named entity disambiguation on web text. For WPD, our supervised approach can make good use of naturally annotated resource, Wikipedia in particular to alleviate manual annotation efforts and domain dependence problems. We also investigate the usage of keywords as semantically more meaningful information units for WPD. Based on meaningful keyword features, we investigate a hierarchical co-reference resolution technique to place ambiguous person names into different clusters. Our disambiguation method does not require a predefined number of persons and can produce good quality clusters for each person. For PPE, we build a personalized profile by identifying relational facts. Our approach is to incorporate two semantic constraints, including both trigger word and entity type which can help reduce noisy data in profile extraction. Both WPD and PPE are built within the framework of graphical models, which can provide sequential structure for semantic feature extraction and tree structure for both name disambiguation and profile extraction. The methods in this thesis are evaluated on publicly available datasets so that performance comparisons can be made to state-of-the-art works and our approach is proven to be effective in named entity disambiguation.-
dcterms.accessRightsopen access-
dcterms.educationLevelPh.D.-
dcterms.extentxii, 148 pages : color illustrations ; 30 cm-
dcterms.issued2014-
dcterms.LCSHText processing (Computer science)-
dcterms.LCSHNatural language processing (Computer science)-
dcterms.LCSHNames.-
dcterms.LCSHHong Kong Polytechnic University -- Dissertations-
Appears in Collections:Thesis
Show simple item record

Page views

49
Last Week
0
Last month
Citations as of Mar 24, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.