Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86771
Title: An integrated summarization framework with hierarchical content representation
Authors: Ouyang, You
Degree: Ph.D.
Issue Date: 2011
Abstract: With the rapid growth of Internet services, more and more electronic text is accessible on-line. While the abundance of information provides more resources for individuals, it also results in the well-recognized information overload problem -- the excessive amount of information being provided. The technology of automatic text summarization has emerged to deal with this problem. Automatic text summarization is the process of creating a shortened version of text by computational techniques to help users catch the important content of the original text(s) with affordable time costs. According to the ways of summary composition, there are extractive summarization methods and abstractive summarization methods. Currently, extractive methods are the mainstream, which will be the focus in this dissertation. The main question to be answered in extractive summarization is how to select a set of sentences from the input documents to form a summary that can best convey the important content of the input documents. Setting off by discovering important words in the input documents to answer the question, we propose several content models for word saliency estimation and word-based sentence ranking and then develop two word-based summarization methods with the content models. Experimental results prove the effectiveness of the proposed methods applied to several authoritative data sets from the Document Understanding Conference (DUC) tasks. Our next target is to incorporate the relations between important words into the summarization process. We propose several methods to identify the latent word relations in the input documents and use them to obtain a hierarchical representation of the document content. Based on the hierarchical content representation, we propose a novel hierarchical summarization method that follows the general-to-specific style to extract summary sentences. Unsystematically studied in previous researches, hierarchical summarization is characterized by integrating various summarization objectives to simultaneously improve the content and readability of the composed summaries. The experimental results on the DUC data sets prove the advantages of the proposed method over traditional summarization methods. Finally, we conduct several tentative studies to examine the use of more sophisticated content representations beyond single words for improving the hierarchical summarization method. The tentative studies capture several important details in developing good hierarchical summarization methods and shed light on the directions of future work in hierarchical summarization.
Subjects: Automatic abstracting.
Computational linguistics.
Hong Kong Polytechnic University -- Dissertations
Pages: xiii, 172 p. : ill. ; 30 cm.
Appears in Collections:Thesis

Show full item record

Page views

45
Last Week
0
Last month
Citations as of Mar 24, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.