Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

Khoirul Umam, Alhadi B., Dian Lestari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.

Original languageEnglish
Title of host publicationSymposium on Biomathematics, SYMOMATH 2016
EditorsBeben Benyamin, Kasbawati
PublisherAmerican Institute of Physics Inc.
ISBN (Electronic)9780735414938
DOIs
Publication statusPublished - 27 Mar 2017
Event4th International Symposium on Biomathematics, SYMOMATH 2016 - Makassar, Indonesia
Duration: 7 Oct 20169 Oct 2016

Publication series

NameAIP Conference Proceedings
Volume1825
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Conference

Conference4th International Symposium on Biomathematics, SYMOMATH 2016
CountryIndonesia
CityMakassar
Period7/10/169/10/16

Fingerprint Dive into the research topics of 'Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm'. Together they form a unique fingerprint.

Cite this