Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/100773
PIRA download icon_1.1View/Download Full Text
Title: Mining significant association rules from uncertain data
Authors: Zhang, A 
Shi, W 
Webb, GI
Issue Date: Jul-2016
Source: Data mining and knowledge discovery, July 2016, v. 30, no. 4, p. 928-963
Abstract: In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.
Keywords: Association rules
Pattern discovery
Statistical evaluation
Uncertain data
Publisher: Springer
Journal: Data mining and knowledge discovery 
ISSN: 1384-5810
DOI: 10.1007/s10618-015-0446-6
Rights: © The Author(s) 2016
This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use(https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s10618-015-0446-6.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
Zhang_Mining_Significant_Association.pdfPre-Published version1.3 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

56
Citations as of Apr 14, 2025

Downloads

47
Citations as of Apr 14, 2025

SCOPUSTM   
Citations

14
Citations as of Dec 19, 2025

WEB OF SCIENCETM
Citations

7
Citations as of Oct 10, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.