Application of k-means clustering algorithm in grouping the DNA sequences of hepatitis B virus (HBV)

Alhadi B., Hengki Tasman, N. Yuniarti, Frisca, I. Mursidah

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Based on WHO data, an estimated of 15 millions people worldwide who are infected with hepatitis B (HBsAg+), which is caused by HBV virus, are also infected by hepatitis D, which is caused by HDV virus. Hepatitis D infection can occur simultaneously with hepatitis B (co infection) or after a person is exposed to chronic hepatitis B (super infection). Since HDV cannot live without HBV, HDV infection is closely related to HBV infection, hence it is very realistic that every effort of prevention against hepatitis B can indirectly prevent hepatitis D. This paper presents clustering of HBV DNA sequences by using k-means clustering algorithm and R programming. Clustering processes are started with collecting HBV DNA sequences from GenBank, then performing extraction HBV DNA sequences using n-mers frequency and furthermore the extraction results are collected as a matrix and normalized using the min-max normalization with interval [0, 1] which will later be used as an input data. The number of clusters is two and the initial centroid selected of the cluster is chosen randomly. In each iteration, the distance of every object to each centroid are calculated using the Euclidean distance and the minimum distance is selected to determine the membership in a cluster until two convergent clusters are created. As the result, the HBV viruses in the first cluster is more virulent than the HBV viruses in the second cluster, so the HBV viruses in the first cluster can potentially evolve with HDV viruses that cause hepatitis D.

Original languageEnglish
Title of host publicationInternational Symposium on Current Progress in Mathematics and Sciences 2016, ISCPMS 2016
Subtitle of host publicationProceedings of the 2nd International Symposium on Current Progress in Mathematics and Sciences 2016
EditorsKiki Ariyanti Sugeng, Djoko Triyono, Terry Mart
PublisherAmerican Institute of Physics Inc.
ISBN (Electronic)9780735415362
DOIs
Publication statusPublished - 10 Jul 2017
Event2nd International Symposium on Current Progress in Mathematics and Sciences 2016, ISCPMS 2016 - Depok, Jawa Barat, Indonesia
Duration: 1 Nov 20162 Nov 2016

Publication series

NameAIP Conference Proceedings
Volume1862
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Conference

Conference2nd International Symposium on Current Progress in Mathematics and Sciences 2016, ISCPMS 2016
CountryIndonesia
CityDepok, Jawa Barat
Period1/11/162/11/16

Fingerprint Dive into the research topics of 'Application of k-means clustering algorithm in grouping the DNA sequences of hepatitis B virus (HBV)'. Together they form a unique fingerprint.

Cite this