Hate Code Detection in Indonesian Tweets using Machine Learning Approach: A Dataset and Preliminary Study

Damayanti Elisabeth, Indra Budi, Muhammad Okky Ibrohim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)

Abstract

The existence of social media causes side effects from freedom of speech to freedom to hate. People can spread hate speech with creative ways to avoid the hate speech detector. Implicit intends used using many codes. The purpose of using these codes is to disguise their hate speech targets. This paper presents an implementation of hate code detection for Indonesian tweets using machine learning and a classification explainer. First, we developed a dataset for hate codes ground truth. We generated hate codes from two scenarios i.e., hate code from hate speech classification and hate code from hate code classification. We used Logistic Regression (LR), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) as our classifier. We also used TF-IDF and word bigrams as the features. The codes consist of word and phrase form. The best f-measure score is 94.90% from hate code classification using Logistic Regression with abusive codes elimination. This number means the model can detect all tweets that have no hate codes. For tweets that annotated have hate code, the f-measure is 28.23% for recognized all the hate codes, and the recall is 56.91%.

Original languageEnglish
Title of host publication2020 8th International Conference on Information and Communication Technology, ICoICT 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728161426
DOIs
Publication statusPublished - Jun 2020
Event8th International Conference on Information and Communication Technology, ICoICT 2020 - Yogyakarta, Indonesia
Duration: 24 Jun 202026 Jun 2020

Publication series

Name2020 8th International Conference on Information and Communication Technology, ICoICT 2020

Conference

Conference8th International Conference on Information and Communication Technology, ICoICT 2020
Country/TerritoryIndonesia
CityYogyakarta
Period24/06/2026/06/20

Keywords

  • hate code
  • hate code detection
  • hate speech
  • multi-label classification

Fingerprint

Dive into the research topics of 'Hate Code Detection in Indonesian Tweets using Machine Learning Approach: A Dataset and Preliminary Study'. Together they form a unique fingerprint.

Cite this