TY - JOUR
T1 - Tweet Clustering in Indonesian Language Twitter Social Media using Naive Bayes Classifier Method
AU - Adek, Rizal Tjut
AU - Nasution, Sahlan
N1 - Publisher Copyright:
© 2018 by the authors; licensee Modestum Ltd., UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution License
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Twitter is one of the social media that has been widely used for various purposes, especially to facilitate the means of information, communication, entertainment and a means of expressing expression. We can find various kinds of information on twitter such as culture, sports, culinary, tourism, music, politics and others. The purpose of this research is to build an application that can group tweets from twitter into sports and non-sports categories using the Naive Bayes classifier method. Text mining is a technique used to handle classification, clustering, information extraction and information retrieval problems. To classify tweets from twitter automatically needed one of the mining Clustering text techniques. Learning outcomes in the form of probabilities will be used as material for processing tweet documents that are not yet known in the category. In the process, the tweet document will go through a text pre-processing process, and grouped into unigram (one word), bigram (two words), trigram (three words). For determining the category of a tweet document that is not yet known, the comparison is made between the results of the appearance of the categories of the three n-grams. From the results of testing the system using 100 to 2000 training data in each category, and 10 testing data in each category. The result is the accuracy of tweets that are categorized as 60% in training data as much as 100, accuracy of 65% in training data as much as 200, and accuracy of 90% in training data as much as 2000. The conclusion is that the more training data used as learning increases also the success rate of clusters to a tweet document.
AB - Twitter is one of the social media that has been widely used for various purposes, especially to facilitate the means of information, communication, entertainment and a means of expressing expression. We can find various kinds of information on twitter such as culture, sports, culinary, tourism, music, politics and others. The purpose of this research is to build an application that can group tweets from twitter into sports and non-sports categories using the Naive Bayes classifier method. Text mining is a technique used to handle classification, clustering, information extraction and information retrieval problems. To classify tweets from twitter automatically needed one of the mining Clustering text techniques. Learning outcomes in the form of probabilities will be used as material for processing tweet documents that are not yet known in the category. In the process, the tweet document will go through a text pre-processing process, and grouped into unigram (one word), bigram (two words), trigram (three words). For determining the category of a tweet document that is not yet known, the comparison is made between the results of the appearance of the categories of the three n-grams. From the results of testing the system using 100 to 2000 training data in each category, and 10 testing data in each category. The result is the accuracy of tweets that are categorized as 60% in training data as much as 100, accuracy of 65% in training data as much as 200, and accuracy of 90% in training data as much as 2000. The conclusion is that the more training data used as learning increases also the success rate of clusters to a tweet document.
KW - Clustering
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85063182684&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85063182684
SN - 1306-3057
VL - 13
SP - 277
EP - 284
JO - Eurasian Journal of Analytical Chemistry
JF - Eurasian Journal of Analytical Chemistry
IS - 6
ER -