Please use this identifier to cite or link to this item:
Title: One story, one flow : Hidden Markov Story Models for multilingual multidocument summarization
Authors: Fung, P
Ngai, G 
Issue Date: 2006
Source: ACM Transactions on speech and language processing, 2006, v. 3, no. 2, p. 1-16
Abstract: This article presents a multidocument, multilingual, theme-based summarization system based on modeling text cohesion (story flow). Conventional extractive summarization systems which pick out salient sentences to include in a summary often disregard any flow or sequence that might exist between these sentences. We argue that such inherent text cohesion exists and is (1) specific to a particular story and (2) specific to a particular language. Documents within the same story, and in the same language, share a common story flow, and this flow differs across stories, and across languages. We propose using Hidden Markov Models (HMMs) as story models. An unsupervised segmental K-means method is used to iteratively cluster multiple documents into different topics (stories) and learn the parameters of parallel Hidden Markov Story Models (HMSM), one for each story. We compare story models within and across stories and within and across languages (English and Chinese). The experimental results support our “one story, one flow” and “one language, one flow” hypotheses. We also propose a Naïve Bayes classifier for document summarization. The performance of our summarizer is superior to conventional methods that do not incorporate text cohesion information. Our HMSM method also provides a simple way to compile a single metasummary for multiple documents from individual summaries via state labeled sentences.
Keywords: Multilingual document summarization
Hidden Markov models
Journal: ACM Transactions on speech and language processing 
Appears in Collections:Journal/Magazine Article

View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

Last Week
Last month
Citations as of Sep 14, 2020

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.