Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/6781
Title: A study on discourse type based information retrieval
Authors: Wang, Dayu
Keywords: Information retrieval.
Discourse analysis.
Hong Kong Polytechnic University -- Dissertations
Issue Date: 2013
Publisher: The Hong Kong Polytechnic University
Abstract: In ad hoc information retrieval (IR), some information need (e.g., find the advantages and disadvantages of smoking) requires the explicit identification of information related to the discourse type (e.g., advantages/disadvantages) as well as to the topic entity (e.g., smoking). Such information need is not uncommon and may not be easily satisfied by using conventional retrieval methods. So we propose the retrieval methods considering the discourse type of topics. We propose IU similarity models and graph-based models to compute the similarity between a part of document (called information unit, IU in short) and a set of topic entity terms. Experimental results show that our IU similarity models with different term weighting schemes perform quite well and they are able to overcome the difficulties caused by the small size of IU. We also propose graph-based models which can compute the similarity of an IU based on topic entity terms only or based on both topic entity terms and discourse types based terms. In graph-based models, the basic unit is an edge that links two terms which are possibly two distinct topic entity terms, or a topic entity term and a discourse type term. These two models can be regarded as baselines of IU-based retrievals that do not rely on any discourse type information. In actual documents, some individual terms are not adequate to present a discourse type. We focus on text patterns that have more powerful expression ability. We use word sequences, POS-tag sequences and the mix of both to match phrases and expression in order to find the text patterns that relate with a specific discourse type. These text patterns can also be selected by regarding the different types of sequences as features in a pattern recognition application. These text patterns are used to quantify whether an IU contains the information on a specific discourse type. For evaluation, we focused on some discourse types that can easily be identified in the TREC topics that are not satisfied very well using conventional retrieval models. We evaluated the discourse type based retrieval using our novel retrieval models and based on the text patterns mined by some selection conditions or learning algorithms. We showed that our concept of discourse type and corresponding solutions are able to enhance the retrieval effectiveness for the selected TREC topics.
Description: viii, 286 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2013 WangD
URI: http://hdl.handle.net/10397/6781
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b26818103_link.htmFor PolyU Users 203 BHTMLView/Open
b26818103_ir.pdfFor All Users (Non-printable) 4.21 MBAdobe PDFView/Open
Show full item record

Page view(s)

264
Last Week
2
Last month
Checked on Feb 19, 2017

Download(s)

127
Checked on Feb 19, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.