A study on discourse type based information retrieval

Wang, Dayu

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/84930

Title:	A study on discourse type based information retrieval
Authors:	Wang, Dayu
Degree:	Ph.D.
Issue Date:	2013
Abstract:	In ad hoc information retrieval (IR), some information need (e.g., find the advantages and disadvantages of smoking) requires the explicit identification of information related to the discourse type (e.g., advantages/disadvantages) as well as to the topic entity (e.g., smoking). Such information need is not uncommon and may not be easily satisfied by using conventional retrieval methods. So we propose the retrieval methods considering the discourse type of topics. We propose IU similarity models and graph-based models to compute the similarity between a part of document (called information unit, IU in short) and a set of topic entity terms. Experimental results show that our IU similarity models with different term weighting schemes perform quite well and they are able to overcome the difficulties caused by the small size of IU. We also propose graph-based models which can compute the similarity of an IU based on topic entity terms only or based on both topic entity terms and discourse types based terms. In graph-based models, the basic unit is an edge that links two terms which are possibly two distinct topic entity terms, or a topic entity term and a discourse type term. These two models can be regarded as baselines of IU-based retrievals that do not rely on any discourse type information. In actual documents, some individual terms are not adequate to present a discourse type. We focus on text patterns that have more powerful expression ability. We use word sequences, POS-tag sequences and the mix of both to match phrases and expression in order to find the text patterns that relate with a specific discourse type. These text patterns can also be selected by regarding the different types of sequences as features in a pattern recognition application. These text patterns are used to quantify whether an IU contains the information on a specific discourse type. For evaluation, we focused on some discourse types that can easily be identified in the TREC topics that are not satisfied very well using conventional retrieval models. We evaluated the discourse type based retrieval using our novel retrieval models and based on the text patterns mined by some selection conditions or learning algorithms. We showed that our concept of discourse type and corresponding solutions are able to enhance the retrieval effectiveness for the selected TREC topics.
Subjects:	Information retrieval. Discourse analysis. Hong Kong Polytechnic University -- Dissertations
Pages:	viii, 286 p. : ill. ; 30 cm.
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/7369

Show full item record

Page views

154

Last Week
0

Last month

Citations as of Jun 22, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM