Design of intelligent k-means based on spark for big data clustering

Ilham Kusuma, M. Anwar Ma'Sum, Novian Habibie, Wisnu Jatmiko, Heru Suhartanto

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Citations (Scopus)

Abstract

The growth of data has bring us to the big data generation where the amount of data cannot be computed using conventional environment. There are a lot of computational environment that had been developed to compute big data, one of them is Hadoop that has Distributed File System and MapReduce framework. Spark is newly framework that can be combined with Hadoop and run on top of it. In this paper, we design intelligent k-means based on Spark for big data clustering. Our design is using batch of data instead using original Resilient Distributed Dataset (RDD). We compare our design with the implementation that using original RDD of data. Result of experiment shows that implementation using batch of data is faster than the implementation using original RDD.

Original languageEnglish
Title of host publication2016 International Workshop on Big Data and Information Security, IWBIS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages89-95
Number of pages7
ISBN (Electronic)9781509034772
DOIs
Publication statusPublished - 6 Mar 2017
Event2016 International Workshop on Big Data and Information Security, IWBIS 2016 - Jakarta, Indonesia
Duration: 18 Oct 201619 Oct 2016

Publication series

Name2016 International Workshop on Big Data and Information Security, IWBIS 2016

Conference

Conference2016 International Workshop on Big Data and Information Security, IWBIS 2016
Country/TerritoryIndonesia
CityJakarta
Period18/10/1619/10/16

Keywords

  • Hadoop
  • Spark
  • intelligent kmeans

Fingerprint

Dive into the research topics of 'Design of intelligent k-means based on spark for big data clustering'. Together they form a unique fingerprint.

Cite this