Automatically building a corpus for sentiment analysis on Indonesian tweets

Alfan Farizki Wicaksono, Clara Vania, Bayu T. Distiawan, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Citations (Scopus)

Abstract

The popularity of the user generated content, such as Twitter, has made it a rich source for the sentiment analysis and opinion mining tasks. This paper presents our study in automatically building a training corpus for the sentiment analysis on Indonesian tweets. We start with a set of seed sentiment corpus and subsequently expand them using a classifier model whose parameters are estimated using the Expectation and Maximization (EM) framework. We apply our automatically built corpus to perform two tasks, namely opinion tweet extraction and tweet polarity classification using various machine learning approaches. Experiment result shows that a classifier model trained on our data, which is automatically constructed using our proposed method, outperforms the baseline system in terms of opinion tweet extraction and tweet polarity classification.

Original languageEnglish
Title of host publicationProceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
EditorsPrachya Boonkwan, Wirote Aroonmanakun, Thepchai Supnithi
PublisherFaculty of Pharmaceutical Sciences, Chulalongkorn University
Pages185-194
Number of pages10
ISBN (Electronic)9786165518871
Publication statusPublished - 1 Jan 2014
Event28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 - Phuket, Thailand
Duration: 12 Dec 201414 Dec 2014

Publication series

NameProceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014

Conference

Conference28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
Country/TerritoryThailand
CityPhuket
Period12/12/1414/12/14

Fingerprint

Dive into the research topics of 'Automatically building a corpus for sentiment analysis on Indonesian tweets'. Together they form a unique fingerprint.

Cite this