CLS and CLS close: The scalable method for mining the semi structured data set

Ford Lumban Gaol, Belawati H. Widjaja

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Semistructured pattern can be formally modeled as Graph Pattern. The most important problem to be solved in mining large semi structured dataset is the scalability of the method. With the successful development of efficient and scalable algorithms for mining frequent itemsets and sequences, it is natural to extend the scope of study to a more general pattern mining problem: mining frequent semistructured patterns or graph patterns. In this paper, we extend the methodology of pattern-growth and develop a novel algorithm called CLS (Canonical Labeling System), which discovers frequent connected subgraphs efficiently using either depth-first search or breadth-first search strategy. A novel canonical labeling system and search order are devised to support efficient pattern growth. CLS has advantages of simplicity and efficiency over other methods since it combines pattern growing and pattern checking into one procedure. Based on CLS, we develop CLS Close to mine closed frequent graphs, which not only eliminates redundant patterns but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.

Original languageEnglish
Title of host publicationInnovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering
Pages186-191
Number of pages6
DOIs
Publication statusPublished - 1 Dec 2008
Event2007 International Conference on Systems, Computing Sciences and Software Engineering, SCSS 2007, Part of the International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering, CISSE 2007 - Bridgeport, CT, United States
Duration: 3 Dec 200712 Dec 2007

Publication series

NameInnovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering

Conference

Conference2007 International Conference on Systems, Computing Sciences and Software Engineering, SCSS 2007, Part of the International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering, CISSE 2007
Country/TerritoryUnited States
CityBridgeport, CT
Period3/12/0712/12/07

Keywords

  • Canonical label
  • Closed pattern
  • Cls code
  • Frequent pattern
  • Graph mining

Fingerprint

Dive into the research topics of 'CLS and CLS close: The scalable method for mining the semi structured data set'. Together they form a unique fingerprint.

Cite this