TY - GEN
T1 - A study on missing values imputation using K-Harmonic means algorithm
T2 - International Conference on Science and Applied Science 2019, ICSAS 2019
AU - Anwar, Taufik
AU - Siswantining, Titin
AU - Sarwinda, Devvi
AU - Soemartojo, Saskya Mary
AU - Bustamam, Alhadi
N1 - Publisher Copyright:
© 2019 Author(s).
PY - 2019/12/27
Y1 - 2019/12/27
N2 - Data cleaning is one step in the preprocessing which in the process often found missing values in the dataset. Missing values is the condition of the absence of data items on a subject. A quick step that can be taken to handle missing values is to remove data containing missing values, but this can reducing information in the data. Another way to handle missing values is by using imputation with mean, median, or mode, and several methods of imputation such as regression, likelihood, and the clustering approach. Imputation with the clustering approach is the focus of this study, where we used the K-Harmonic Means which has been adjusted to handle mixed data. K-Harmonic Means is an extension of K-Means by reducing random centroid initialization sensitivity problems. Imputation of the missing values is carried out by distributing missing values observation to the cluster and replacing the missing values with the information on the same centroid cluster. The results of the simulation were evaluated using the root mean square error and the accuracy values of each imputation value for numerical and categorical data respectively.
AB - Data cleaning is one step in the preprocessing which in the process often found missing values in the dataset. Missing values is the condition of the absence of data items on a subject. A quick step that can be taken to handle missing values is to remove data containing missing values, but this can reducing information in the data. Another way to handle missing values is by using imputation with mean, median, or mode, and several methods of imputation such as regression, likelihood, and the clustering approach. Imputation with the clustering approach is the focus of this study, where we used the K-Harmonic Means which has been adjusted to handle mixed data. K-Harmonic Means is an extension of K-Means by reducing random centroid initialization sensitivity problems. Imputation of the missing values is carried out by distributing missing values observation to the cluster and replacing the missing values with the information on the same centroid cluster. The results of the simulation were evaluated using the root mean square error and the accuracy values of each imputation value for numerical and categorical data respectively.
UR - http://www.scopus.com/inward/record.url?scp=85077974159&partnerID=8YFLogxK
U2 - 10.1063/1.5141651
DO - 10.1063/1.5141651
M3 - Conference contribution
AN - SCOPUS:85077974159
T3 - AIP Conference Proceedings
BT - International Conference on Science and Applied Science, ICSAS 2019
A2 - Suparmi, A.
A2 - Nugraha, Dewanta Arya
PB - American Institute of Physics Inc.
Y2 - 20 July 2019
ER -