Named entity disambiguation from web text

Xu, Jian

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85141

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	Xu, Jian	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/7776	-
dc.language.iso	English	-
dc.title	Named entity disambiguation from web text	-
dc.type	Thesis	-
dcterms.abstract	Named entity disambiguation is the problem of grouping name mentions into clusters, with each cluster referring to the same underlying entity. In this thesis, we focus on named entity disambiguation from web text, because finding information about person on the Internet is one of the most common activities of online users. Person{174}s names, however, are highly ambiguous with a large number of people sharing the same name. Named entity disambiguation therefore becomes increasingly important for many applications such as information retrieval, question answering, cross-document co-reference, relation discovery and so on. This leads to our study of named entity disambiguation over the Internet. In general, named entity disambiguation for web text includes two tasks: (1) Web Person Disambiguation (WPD), which groups search results into different clusters with each cluster referring to the same person; and (2) personal profile extraction (PPE), which can help build each person{174}s relational information in the cluster. The main challenges in named entity disambiguation include (1) how to select meaningful features that are unique to identify named entities; (2) how to guarantee high performance in WPD, even if there is no prior knowledge of the number of persons having the same name; (3) how to obtain and select quality training data from an external knowledge base for personal profile extraction (PPE), since manually annotated data is costly to yield and limited in scale. In this thesis, we explore the use of more semantically relevant information for named entity disambiguation on web text. For WPD, our supervised approach can make good use of naturally annotated resource, Wikipedia in particular to alleviate manual annotation efforts and domain dependence problems. We also investigate the usage of keywords as semantically more meaningful information units for WPD. Based on meaningful keyword features, we investigate a hierarchical co-reference resolution technique to place ambiguous person names into different clusters. Our disambiguation method does not require a predefined number of persons and can produce good quality clusters for each person. For PPE, we build a personalized profile by identifying relational facts. Our approach is to incorporate two semantic constraints, including both trigger word and entity type which can help reduce noisy data in profile extraction. Both WPD and PPE are built within the framework of graphical models, which can provide sequential structure for semantic feature extraction and tree structure for both name disambiguation and profile extraction. The methods in this thesis are evaluated on publicly available datasets so that performance comparisons can be made to state-of-the-art works and our approach is proven to be effective in named entity disambiguation.	-
dcterms.accessRights	open access	-
dcterms.educationLevel	Ph.D.	-
dcterms.extent	xii, 148 pages : color illustrations ; 30 cm	-
dcterms.issued	2014	-
dcterms.LCSH	Text processing (Computer science)	-
dcterms.LCSH	Natural language processing (Computer science)	-
dcterms.LCSH	Names.	-
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	-
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/7776

Show simple item record

Page views

239

Last Week
4

Last month

Citations as of Apr 12, 2026

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM