A novel centroid initialization in missing value imputation towards mixed datasets

Titin Siswantining, Taufik Anwar, Devvi Sarwinda, Herley Shaori Al-Ash

Research output: Contribution to journalArticlepeer-review

Abstract

Currently, many databases contain missing values, especially in medical data. Statistical and data mining approaches often require complete data conditions, where these two approaches will not provide adequate performance if the data contains missing values. Several techniques have been made to overcome missing values, one of which is by deleting data containing missing values. However, this approach will omit a lot of information if the data found includes many missing values. This study used an imputation approach (filling in the missing attributes) with a clustering approach. One of the most common clustering approaches is K-Means Clustering. In K-means clustering, the value of the centroid gets from the closest observed value. In this study, we propose updating the centroid value based on the harmonic average of the distance across all observations per centroid. This method is known as K-Harmonic Means Clustering (KHM). We proposed a new program approach for a mixed dataset on three scenarios for missing values of 10%, 20%, and 30%. From the experiments conducted on experimental data sets containing missing values, we get a small proportion of missing values (10%) with a small number of clusters or K, which gives a smaller RMSE value compared to other scenarios.

Original languageEnglish
Article number11
Pages (from-to)1-36
Number of pages36
JournalCommunications in Mathematical Biology and Neuroscience
Volume2021
DOIs
Publication statusPublished - 2021

Keywords

  • Clustering
  • Harmonic series
  • Imputation
  • K-means
  • Mixed dataset

Fingerprint Dive into the research topics of 'A novel centroid initialization in missing value imputation towards mixed datasets'. Together they form a unique fingerprint.

Cite this