Identification of hate speech and abusive language on Indonesian twitter using theword2vec, part of speech and emoji features

Muhammad Okky Ibrohim, Muhammad Akbar Setiadi, Indra Budi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Freedom of speech for the people of Indonesia on social media makes the spread of hate speech and abusive language inevitable. If there is no proper handling, this will lead to social disharmony between individuals and communities. The identification of hate speech and abusive language on Twitter in the Indonesian language is quite challenging. Because of its ability to understand the meaning of a sentence, semantic features such as word embedding can be relied on to understand tweets that contain hateful and abusive words. In this study, word embedding (word2vec) feature and its combinations with part of speech and/or emoji were used to identify hate speech and abusive language on Twitter in the Indonesian language. Furthermore, some combinations of unigram with part of speech and/or emojis were also utilized during the experiment and the results were studied. The classification algorithms used in this study were Support Vector Machine, Random Forest Decision Tree, and Logistic Regression. The combination of unigram features, part of speech and emoji obtained the highest accuracy value of 79.85% with F-Measure of 87.51%.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Advanced Information Science and System, AISS 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450372916
DOIs
Publication statusPublished - 15 Nov 2019
Event2019 International Conference on Advanced Information Science and System, AISS 2019 - Singapore, Singapore
Duration: 15 Nov 201917 Nov 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 International Conference on Advanced Information Science and System, AISS 2019
Country/TerritorySingapore
CitySingapore
Period15/11/1917/11/19

Keywords

  • Abusive Language
  • Hate Speech
  • Machine Learning
  • Twitter

Fingerprint

Dive into the research topics of 'Identification of hate speech and abusive language on Indonesian twitter using theword2vec, part of speech and emoji features'. Together they form a unique fingerprint.

Cite this