Hate speech identification using the hate codes for Indonesian tweets

Nur Indah Pratiwit, Indra Budi, Meganingrum Arista Jiwanggi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

The hate speech has become the major source of negativity spread in all over the social media. As the social media becomes aware of this issue, they gradually build several new regulations to handle the spread of hate speech e.g. by automatically blocking or suspending the accounts or posts containing hate speech. However, the social media users have become more creative in expressing the hate speech. To avoid the social media regulations regarding the hate speech, users usually use some special codes to interact with each other. This study aims to utilize the hate codes to identify the hate speech on the social media data. We used the Indonesian tweets as the dataset. We utilized Logistic Regression, Support Vector Machine, Naïve Bayes, and Random Forest Decision Tree as the classifiers. The highest F-Measure score for the hate speech identification was 80.71% by using the hate code feature combined with Logistic Regression as the classifier.

Original languageEnglish
Title of host publicationProceedings of the 2019 2nd International Conference on Data Science and Information Technology, DSIT 2019
PublisherAssociation for Computing Machinery
Pages128-133
Number of pages6
ISBN (Electronic)9781450371414
DOIs
Publication statusPublished - 19 Jul 2019
Event2nd International Conference on Data Science and Information Technology, DSIT 2019 - Seoul, Korea, Republic of
Duration: 19 Jul 201921 Jul 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2nd International Conference on Data Science and Information Technology, DSIT 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period19/07/1921/07/19

Keywords

  • Classification
  • Hate code
  • Hate speech
  • Twitter

Fingerprint

Dive into the research topics of 'Hate speech identification using the hate codes for Indonesian tweets'. Together they form a unique fingerprint.

Cite this