Hierarchical Ordered Partitioning And Collapsing Hybrid (HOPACH) is one of the powerful clustering methods which combine the strengths of partitioning and agglomerative clustering methods. Several partition clustering methods such as PAM, K-Means, SOM, or other partitioning algorithms can be used in the partitioning process. This process is followed by the ordering steps, then continued with the agglomerative process. The number of main clusters is determined by MSS (Mean Split Silhouette) value. MSS is used to measure the heterogeneity of the clustering result. The lower the MSS value, the more homogenous each cluster members. We select the number of clusters from the clustering results with minimum MSS. In this implementation of HOPACH, we incorporate k-Means partitioning algorithm in this HOPACH clustering method, to cluster and analyze 136 DNA sequences of Ebola viruses. The clustering process is started with collecting DNA sequences of Ebola viruses from GenBank, then followed by performing features extraction of these DNA sequences using N-Mers frequency. The extraction results are compiled to be a features matrix and normalized using the min-max normalization with the interval [0, 1] as an input data to generate genetic distance matrix using Euclidian distance. The genetic distance matrix is used in partitioning process by the K-Means algorithm in HOPACH clustering. As the results, we obtained 8 clusters with minimum MSS (Mean Split Silhouette) 0.50266. The clustering process in this article uses the open source program R.
|Journal||IOP Conference Series: Earth and Environmental Science|
|Publication status||Published - 9 Apr 2019|
|Event||1st International Conference on Environmental Geography and Geography Education, ICEGE 2018 - Jember, East Java, Indonesia|
Duration: 17 Nov 2018 → 18 Nov 2018