Tweet Clustering in Indonesian Language Twitter Social Media using Naive Bayes Classifier Method

Rizal Tjut Adek, Taufik Fuadi Abidin, Khairul Munadi, Zainal Arifin Hasibuan, Sahlan Nasution

Research output: Contribution to journalArticlepeer-review

Abstract

Twitter is one of the social media that has been widely used for various purposes, especially to facilitate the means of information, communication, entertainment and a means of expressing expression. We can find various kinds of information on twitter such as culture, sports, culinary, tourism, music, politics and others. The purpose of this research is to build an application that can group tweets from twitter into sports and non-sports categories using the Naive Bayes classifier method. Text mining is a technique used to handle classification, clustering, information extraction and information retrieval problems. To classify tweets from twitter automatically needed one of the mining Clustering text techniques. Learning outcomes in the form of probabilities will be used as material for processing tweet documents that are not yet known in the category. In the process, the tweet document will go through a text pre-processing process, and grouped into unigram (one word), bigram (two words), trigram (three words). For determining the category of a tweet document that is not yet known, the comparison is made between the results of the appearance of the categories of the three n-grams. From the results of testing the system using 100 to 2000 training data in each category, and 10 testing data in each category. The result is the accuracy of tweets that are categorized as 60% in training data as much as 100, accuracy of 65% in training data as much as 200, and accuracy of 90% in training data as much as 2000. The conclusion is that the more training data used as learning increases also the success rate of clusters to a tweet document.

Original languageEnglish
Pages (from-to)277-284
Number of pages8
JournalEurasian Journal of Analytical Chemistry
Volume13
Issue number6
Publication statusPublished - 1 Jan 2018

Keywords

  • Clustering
  • Twitter

Fingerprint Dive into the research topics of 'Tweet Clustering in Indonesian Language Twitter Social Media using Naive Bayes Classifier Method'. Together they form a unique fingerprint.

Cite this