Combining Linguistic, Semantic and Lexicon Feature for Emoji Classification in Twitter Dataset

Rinda Wahyuni, Indra Budi

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

Emoji is a picture character used in social media to express emotion of a text message. With the increasing use of emoji few who study the relationship between emoji and text. Due to diversity of emoji and the similarity meaning between emoji, emoji classification task is more relative complex than common text classification task. In this paper, we build a computational model by extracted various features namely: linguistic feature, semantic feature, and lexicon feature to improve emoji classification performance. Then we train 400k tweet using two different classifiers Stochastic Gradient Descent Classifier and Logistic Regression. The experiment showed that our proposed feature using Logistic Regression outperformed the baseline.

Original languageEnglish
Pages (from-to)194-201
Number of pages8
JournalProcedia Computer Science
Volume135
DOIs
Publication statusPublished - 1 Jan 2018
Event3rd International Conference on Computer Science and Computational Intelligence, ICCSCI 2018 - Tangerang, Indonesia
Duration: 7 Sep 20188 Sep 2018

Keywords

  • emoji
  • lexicon
  • n-gram
  • word-embedding

Fingerprint Dive into the research topics of 'Combining Linguistic, Semantic and Lexicon Feature for Emoji Classification in Twitter Dataset'. Together they form a unique fingerprint.

Cite this