Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/28320
Title: Development of an intelligent distributed news retrieval system
Authors: Liu, JNK
Choi, KC
Chai, JY
Keywords: Distributed news retrieval
Intelligent system
MapReduce
Web crawler
Issue Date: 2012
Source: International journal of knowledge-based and intelligent engineering systems, 2012, v. 16, no. 2, p. 129-140 How to cite?
Journal: International Journal of Knowledge-Based and Intelligent Engineering Systems 
Abstract: Currently available web news retrieval systems face a number of problems in that web-based news retrieval requires the ability to quickly and accurately process and update a very large amount of data which are constantly being updated. In this paper, we present the development of an intelligent distributed web news retrieval system the goal of which is to accurately retrieve and organize the web news information. It includes: a novel optimized crawler algorithm whose fetching-speed is several times faster than that of the traditional crawler; a keen tag based extraction algorithm which can extract the data rich content with minimal manual effort and which also allows data to be classified as important or not important so that the crawler can revisit and update important data; a modified MapReduce improved by estimating the execution time of each subtask, which is proven to be able to reduce the number of the unusual tasks and shorten the whole job execution time.
URI: http://hdl.handle.net/10397/28320
ISSN: 1327-2314
DOI: 10.3233/KES-2011-0237
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

2
Last Week
0
Last month
0
Citations as of Aug 14, 2017

Page view(s)

59
Last Week
5
Last month
Checked on Aug 13, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.