Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/29760
Title: Discovering interesting molecular substructures for molecular classification
Authors: Lam, WWM
Chan, KCC 
Keywords: Frequent subgraph
Graph mining
Interestingness
Molecular classification
Molecular structures
Issue Date: 2010
Publisher: Institute of Electrical and Electronics Engineers
Source: IEEE transactions on nanobioscience, 2010, v. 9, no. 2, 5477194, p. 77-89 How to cite?
Journal: IEEE transactions on nanobioscience 
Abstract: Given a set of molecular structure data preclassified into a number of classes, the molecular classification problem is concerned with the discovering of interesting structural patterns in the data so that unseen molecules not originally in the dataset can be accurately classified. To tackle the problem, interesting molecular substructures have to be discovered and this is done typically by first representing molecular structures in molecular graphs, and then, using graph-mining algorithms to discover frequently occurring subgraphs in them. These subgraphs are then used to characterize different classes for molecular classification. While such an approach can be very effective, it should be noted that a substructure that occurs frequently in one class may also does occur in another. The discovering of frequent subgraphs for molecular classification may, therefore, not always be the most effective. In this paper, we propose a novel technique called mining interesting substructures in molecular data for classification (MISMOC) that can discover interesting frequent subgraphs not just for the characterization of a molecular class but also for the distinguishing of it from the others. Using a test statistic, MISMOC screens each frequent subgraph to determine if they are interesting. For those that are interesting, their degrees of interestingness are determined using an information-theoretic measure. When classifying an unseen molecule, its structure is then matched against the interesting subgraphs in each class and a total interestingness measure for the unseen molecule to be classified into a particular class is determined, which is based on the interestingness of each matched subgraphs. The performance of MISMOC is evaluated using both artificial and real datasets, and the results show that it can be an effective approach for molecular classification.
URI: http://hdl.handle.net/10397/29760
ISSN: 1536-1241
EISSN: 1558-2639
DOI: 10.1109/TNB.2010.2042609
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

3
Last Week
0
Last month
0
Citations as of Nov 10, 2017

WEB OF SCIENCETM
Citations

1
Last Week
0
Last month
0
Citations as of Nov 16, 2017

Page view(s)

51
Last Week
1
Last month
Checked on Nov 20, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.