Sparse data for document clustering

Ionia Veritawati, Ito Wasito, Mujiono

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Document clustering which is a part of text mining framework is used to process models and real data collection of cancer documents into several groups. A vector space model of the documents based on their key phrases are formed and called sparse matrix which contains many zero values. A sparse dimensional reduction and several methods of clustering include K-means, Self Organizing and Non-negative Matrices Factorization (NMF) are applied to the data, then the results are compared. Sparse method in dimensional reduction step using Arnoldi Method provides a better result of clustering validity twice more than standard dimensional reduction result.

Original languageEnglish
Title of host publication2013 International Conference of Information and Communication Technology, ICoICT 2013
Pages38-43
Number of pages6
DOIs
Publication statusPublished - 10 Sep 2013
Event2013 International Conference of Information and Communication Technology, ICoICT 2013 - Bandung, Indonesia
Duration: 20 Mar 201322 Mar 2013

Publication series

Name2013 International Conference of Information and Communication Technology, ICoICT 2013

Conference

Conference2013 International Conference of Information and Communication Technology, ICoICT 2013
CountryIndonesia
CityBandung
Period20/03/1322/03/13

Keywords

  • arnoldi method
  • competitive learning
  • k-means
  • non-negative matrices factorization
  • self-organizing
  • sparse

Fingerprint Dive into the research topics of 'Sparse data for document clustering'. Together they form a unique fingerprint.

Cite this