Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/4931
Title: An integrated summarization framework with hierarchical content representation
Authors: Ouyang, You
Keywords: Automatic abstracting.
Computational linguistics.
Hong Kong Polytechnic University -- Dissertations
Issue Date: 2011
Publisher: The Hong Kong Polytechnic University
Abstract: With the rapid growth of Internet services, more and more electronic text is accessible on-line. While the abundance of information provides more resources for individuals, it also results in the well-recognized information overload problem -- the excessive amount of information being provided. The technology of automatic text summarization has emerged to deal with this problem. Automatic text summarization is the process of creating a shortened version of text by computational techniques to help users catch the important content of the original text(s) with affordable time costs. According to the ways of summary composition, there are extractive summarization methods and abstractive summarization methods. Currently, extractive methods are the mainstream, which will be the focus in this dissertation. The main question to be answered in extractive summarization is how to select a set of sentences from the input documents to form a summary that can best convey the important content of the input documents. Setting off by discovering important words in the input documents to answer the question, we propose several content models for word saliency estimation and word-based sentence ranking and then develop two word-based summarization methods with the content models. Experimental results prove the effectiveness of the proposed methods applied to several authoritative data sets from the Document Understanding Conference (DUC) tasks. Our next target is to incorporate the relations between important words into the summarization process. We propose several methods to identify the latent word relations in the input documents and use them to obtain a hierarchical representation of the document content. Based on the hierarchical content representation, we propose a novel hierarchical summarization method that follows the general-to-specific style to extract summary sentences. Unsystematically studied in previous researches, hierarchical summarization is characterized by integrating various summarization objectives to simultaneously improve the content and readability of the composed summaries. The experimental results on the DUC data sets prove the advantages of the proposed method over traditional summarization methods. Finally, we conduct several tentative studies to examine the use of more sophisticated content representations beyond single words for improving the hierarchical summarization method. The tentative studies capture several important details in developing good hierarchical summarization methods and shed light on the directions of future work in hierarchical summarization.
Description: xiii, 172 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2011 Ouyang
URI: http://hdl.handle.net/10397/4931
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b24625152_link.htmFor PolyU Users162 BHTMLView/Open
b24625152_ir.pdfFor All Users (Non-printable) 1.51 MBAdobe PDFView/Open
Show full item record

Page view(s)

425
Last Week
3
Last month
Checked on Mar 19, 2017

Download(s)

280
Checked on Mar 19, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.