Tree stream mining algorithm with Chernoff-bound and standard deviation approach for big data stream

Ari Wibisono, Devvi Sarwinda, Petrus Mursanto

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

We propose a Chernoff-bound approach and examine standard deviation value to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. It is a data stream mining algorithm that can observe and form a model tree from a large dataset. The standard FIMT-DD algorithm uses the Hoeffding bound for its splitting criterion. The use of the simplified Chernoff bound is proposed for splitting a tree more accurately than the Hoeffding bound. We verify our proposed Chernoff bound and standard deviation value examination by evaluating several large real-world datasets that consists of 100,000,000 instances. The result shows that our proposed method has an improvement in term of accuracy compared to the Hoeffding bound in the FIMT-DD algorithm. Error value measurements demonstrate that the Chernoff bound approach contributes to smaller error value compared to the standard method. In term of overall simulation time, the Chernoff bound approach increases the duration of simulation time and utilizes more memory compared to Hoeffding bound. However, Chernoff bound is much faster to get a smaller error compared to the standard algorithm.

Original languageEnglish
Article number58
JournalJournal of Big Data
Volume6
Issue number1
DOIs
Publication statusPublished - 1 Dec 2019

Keywords

  • Big data
  • Chernoff bound
  • Data stream
  • FIMT-DD
  • Intelligent systems
  • Standard deviation

Fingerprint Dive into the research topics of 'Tree stream mining algorithm with Chernoff-bound and standard deviation approach for big data stream'. Together they form a unique fingerprint.

Cite this