Improving classification performance by extending documents terms

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Classification is a technique in data mining for categorizing objects. Text Classification is re-challenged for classifying very short documents or text as shown in social media collection. This paper proposes a method to improve the performance of classification on short documents. In this work, we expand words in every document before the documents are classified We use TFIDF model, Hidden Markov Model k-means clustering, and Latent Semantic Indexing (LSI) for expanding documents. The results show that extending document term by just 1 word will increase its accuracy, while extending by 2,4, and 8 words tend to give stable results.

Original languageEnglish
Title of host publicationProceedings of 2014 International Conference on Data and Software Engineering, ICODSE 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479979967
DOIs
Publication statusPublished - 17 Mar 2014
Event2014 International Conference on Data and Software Engineering, ICODSE 2014 - Bandung, Indonesia
Duration: 26 Nov 201427 Nov 2014

Publication series

NameProceedings of 2014 International Conference on Data and Software Engineering, ICODSE 2014

Conference

Conference2014 International Conference on Data and Software Engineering, ICODSE 2014
Country/TerritoryIndonesia
CityBandung
Period26/11/1427/11/14

Keywords

  • Hidden Markov Model k-means
  • Latent Semantic Indexing
  • TFIDF model
  • extend words
  • text classification

Fingerprint

Dive into the research topics of 'Improving classification performance by extending documents terms'. Together they form a unique fingerprint.

Cite this