Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/19449
Title: A scalable intelligent non-content-based spam-filtering framework
Authors: Hu, Y
Guo, C
Ngai, EWT 
Liu, M
Chen, S
Keywords: Email header
Email server
N-Gram algorithm
Scalable Intelligent Hybrid Spam-Filtering
Issue Date: 2010
Publisher: Pergamon Press
Source: Expert systems with applications, 2010, v. 37, no. 12, p. 8557-8565 How to cite?
Journal: Expert systems with applications 
Abstract: Designing a spam-filtering system that can run efficiently on heavily burdened servers is particularly important to the widely used email service providers (ESPs) (e.g., Hotmail, Yahoo, and Gmail) who have to deal with millions of emails everyday. Two primary challenges these companies face in spam filtering are efficiency and scalability. This study is undertaken to develop an efficient and scalable spam-filtering framework for heavily burdened email servers. We propose an Intelligent Hybrid Spam-Filtering Framework (IHSFF) to detect spam by analyzing only email headers. This framework is especially suitable for giant email servers because of its efficiency and scalability. The proposed filtering system may bedeployed alone or in conjunction with other filters. We extract five features from the email header, namely "originator field", "destination field", "X-Mailer field", "sender server IP address" and "mail subject". Email subjects are digitalized using an algorithm based on n-grams for better performance. Moreover, using real-world data from a well-known ESP in China, we employ various machine-learning algorithms to test the model. Experimental results show that the framework using the Random Forest algorithm achieves good accuracy, recall, precision, and F-measure. With the addition of MetaCost framework, the model works stably well and incurs small costs in various cost-sensitive scenarios.
URI: http://hdl.handle.net/10397/19449
ISSN: 0957-4174
EISSN: 1873-6793
DOI: 10.1016/j.eswa.2010.05.020
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

19
Last Week
0
Last month
0
Citations as of Aug 17, 2017

WEB OF SCIENCETM
Citations

13
Last Week
0
Last month
0
Citations as of Jul 28, 2017

Page view(s)

43
Last Week
1
Last month
Checked on Aug 13, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.