Breast cancer is a serious disease that requires data analysis for diagnosis and treatment. Clustering is a data mining technique that is often used in breast cancer research to assess the level of malignancy at an early stage. However, clustering categorical data can be challenging because different levels in categorical variables can impact the clustering process. This research proposes a modified entropy measure (MEM) to enhance clustering performance. MEM aims to address the issue of distance-based measures in clustering categorical data. It is also a useful tool for assessing data loss in categorical clustering, which helps to understand the patterns and relationships by quantifying the information lost during clustering. An evaluation compares k-modes+MEM, k-means+MEM, DBSCAN+MEM, and affinity+MEM with conventional clustering algorithms. The assessment metrics of clustering accuracy, intra-cluster distance and fowlkes-mallow index (FMI) are employed to evaluate the algorithm performance. Experimental results show significant improvements. k-modes+MEM algorithm achieves a reduction in average intra-cluster distance and outperforms other algorithms in accuracy, intra-cluster distance, and FMI. The proposed algorithm can be extended to heterogeneous datasets in various domains such as healthcare, finance, and marketing.
|Number of pages
|Indonesian Journal of Electrical Engineering and Computer Science
|Published - Nov 2023
- Categorical data Clustering Distance metric Entropy measure Evaluation performance