Please use this identifier to cite or link to this item:
Title: How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds
Authors: Wang, WM
Li, Z
Wang, JW
Zheng, ZH 
Keywords: Extractive text summarization
Upper bound construction
Ideal extracts construction
Summarization evaluation
Issue Date: 2017
Publisher: Pergamon Press
Source: Expert systems with applications, 2017, v. 90, p. 439-463 How to cite?
Journal: Expert systems with applications 
Abstract: Extractive text summarization is an effective way to automatically reduce a text to, a summary by selecting a subset of the text. The performance of a summarization system is usually evaluated by comparing with human-constructed extractive summaries that are created in annotated text datasets. However, for datasets where an abstract is written for reader purpose, the performance of a summarization system is evaluated by comparing with an abstract that is created by human who uses his own words. This makes it difficult to determine how far the state-of-the-art extractive methods are away from the upper bound that an ideal extractive method might achieve. In addition, the performance of an extractive method is always different in each domain, which make it difficult to benchmark. Previous studies construct an ideal sentence-based extract of a document that provides the best score of a given metric by exhaustive search of all possible sentence combinations of a given length. They then use the performance of the extract as the sentence-based upper-bound. However, this only applies to short texts. For long texts and multiple documents, previous studies rely on manual effort, which is expensive and time consuming. In this paper, we propose nine fast heuristic methods to generate the near ideal sentence-based extracts for long texts and multiple documents. Furthermore, we propose an n-gram construction method to construct the word-based upper-bound. A percentage ranking method is used to benchmark different extractive methods across different corpora. In the experiments, five different corpora are used. The results show that the near upper bounds constructed by the proposed methods are close to that using exhaustive search, but the proposed methods are much faster. Six general extractive summarization methods were also assessed to demonstrate the difference between the performance of the methods and the near upper bounds.
ISSN: 0957-4174
EISSN: 1873-6793
DOI: 10.1016/j.eswa.2017.08.040
Appears in Collections:Journal/Magazine Article

View full-text via PolyU eLinks SFX Query
Show full item record


Citations as of Nov 28, 2018


Last Week
Last month
Citations as of Dec 7, 2018

Page view(s)

Last Week
Last month
Citations as of Dec 9, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.