Abstract
We propose a Chernoff-bound approach and examine standard deviation value to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. It is a data stream mining algorithm that can observe and form a model tree from a large dataset. The standard FIMT-DD algorithm uses the Hoeffding bound for its splitting criterion. The use of the simplified Chernoff bound is proposed for splitting a tree more accurately than the Hoeffding bound. We verify our proposed Chernoff bound and standard deviation value examination by evaluating several large real-world datasets that consists of 100,000,000 instances. The result shows that our proposed method has an improvement in term of accuracy compared to the Hoeffding bound in the FIMT-DD algorithm. Error value measurements demonstrate that the Chernoff bound approach contributes to smaller error value compared to the standard method. In term of overall simulation time, the Chernoff bound approach increases the duration of simulation time and utilizes more memory compared to Hoeffding bound. However, Chernoff bound is much faster to get a smaller error compared to the standard algorithm.
Original language | English |
---|---|
Article number | 58 |
Journal | Journal of Big Data |
Volume | 6 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Dec 2019 |
Keywords
- Big data
- Chernoff bound
- Data stream
- FIMT-DD
- Intelligent systems
- Standard deviation