Clustering performance using k-modes with modified entropy measure for breast cancer

Nurshazwani Muhamad Mahfuz, Heru Suhartanto, Kusmardi Kusmardi, Marina Yusoff

Research output: Contribution to journalArticlepeer-review

Abstract

Breast cancer is a serious disease that requires data analysis for diagnosis and treatment. Clustering is a data mining technique that is often used in breast cancer research to assess the level of malignancy at an early stage. However, clustering categorical data can be challenging because different levels in categorical variables can impact the clustering process. This research proposes a modified entropy measure (MEM) to enhance clustering performance. MEM aims to address the issue of distance-based measures in clustering categorical data. It is also a useful tool for assessing data loss in categorical clustering, which helps to understand the patterns and relationships by quantifying the information lost during clustering. An evaluation compares k-modes+MEM, k-means+MEM, DBSCAN+MEM, and affinity+MEM with conventional clustering algorithms. The assessment metrics of clustering accuracy, intra-cluster distance and fowlkes-mallow index (FMI) are employed to evaluate the algorithm performance. Experimental results show significant improvements. k-modes+MEM algorithm achieves a reduction in average intra-cluster distance and outperforms other algorithms in accuracy, intra-cluster distance, and FMI. The proposed algorithm can be extended to heterogeneous datasets in various domains such as healthcare, finance, and marketing.

Original languageEnglish
Pages (from-to)1150-1158
Number of pages9
JournalIndonesian Journal of Electrical Engineering and Computer Science
Volume32
Issue number2
DOIs
Publication statusPublished - Nov 2023

Keywords

  • Categorical data Clustering Distance metric Entropy measure Evaluation performance

Fingerprint

Dive into the research topics of 'Clustering performance using k-modes with modified entropy measure for breast cancer'. Together they form a unique fingerprint.

Cite this