Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

Mentari Dian Arimbi, Alhadi B., Dian Lestari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

Original languageEnglish
Title of host publicationSymposium on Biomathematics, SYMOMATH 2016
EditorsBeben Benyamin, Kasbawati
PublisherAmerican Institute of Physics Inc.
ISBN (Electronic)9780735414938
DOIs
Publication statusPublished - 27 Mar 2017
Event4th International Symposium on Biomathematics, SYMOMATH 2016 - Makassar, Indonesia
Duration: 7 Oct 20169 Oct 2016

Publication series

NameAIP Conference Proceedings
Volume1825
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Conference

Conference4th International Symposium on Biomathematics, SYMOMATH 2016
Country/TerritoryIndonesia
CityMakassar
Period7/10/169/10/16

Fingerprint

Dive into the research topics of 'Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA'. Together they form a unique fingerprint.

Cite this