## Abstract

K-Means is a well known algorithms of clusteing. It generates some groups based on degree of similarity. Simplicity of implementation, ease of interpretation, adaptability to sparse data, linear complexity, speed of convergence, and versatile in almost every aspect are noble characteristics of this algorithm. However, this algorithm is very sensitive on defining initial centroids process. Giving a bad initial centroid always produces a bad quality output. Due to this weakness, it is recommended to make some runs with different initial centroids and select the initial centroid that produces cluster with minimum error. However, this procedure is hard to achieve a satisfying result.

This paper introduces a new approach to minimize the initial centroid problem of K-Means algorithm. This approach focus on centroid updating stage in K-Means algorithm by applying minimum forest graph to produce better new centroids. Based on gain information and Dunn index values, this approach provided a better result than Forgy method when this approach tested on both well distributed and noisy dataset. Moreover, from the experiments with two dimentional data, the proposed approach produced consisten members of each cluster in every run, where it could not be found in Forgy method.

This paper introduces a new approach to minimize the initial centroid problem of K-Means algorithm. This approach focus on centroid updating stage in K-Means algorithm by applying minimum forest graph to produce better new centroids. Based on gain information and Dunn index values, this approach provided a better result than Forgy method when this approach tested on both well distributed and noisy dataset. Moreover, from the experiments with two dimentional data, the proposed approach produced consisten members of each cluster in every run, where it could not be found in Forgy method.

Original language | English |
---|---|

Journal | Informal : Informatics Journal |

DOIs | |

Publication status | Published - 2018 |