Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text

Shofianina Dwi Ananda Putri, Muhammad Okky Ibrohim, Indra Budi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In social media, people are free to express their feelings and thoughts. However, people can also use abusive language and hate speech to insult or humiliate individuals or groups on social media, such as Twitter. Various detection methods have been developed to control the spread of abusive language and hate speech in Indonesia, but the detection process is still focused on monolingual. As a country with various ethnicities and cultures, Indonesia also has a variety of local languages. This study examines abusive language and hate speech detection on Twitter, which also contains five local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. In this work, we present a preliminary evaluation to find the best performance of machine learning methods in detecting abusive language and hate speech on Twitter as preliminary study for each local language. We use several machine learning algorithms, such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) as classifiers and TF-IDF weighted word n-gram and character-n gram as feature extraction. The experiments use the 5-Fold cross-validation approach and evaluated by measuring the F-1-Score. After the experiment, we have obtained the SVM classifier with word n-gram features show the best F-1-Score for each dataset.

Original languageEnglish
Title of host publicationRecent Advances in Information and Communication Technology 2021 - Proceedings of the 17th International Conference on Computing and Information Technology, IC2IT 2021
EditorsPhayung Meesad, Sunantha Sodsee, Watchareewan Jitsakul, Sakchai Tangwannawit
PublisherSpringer Science and Business Media Deutschland GmbH
Pages88-98
Number of pages11
ISBN (Print)9783030797560
DOIs
Publication statusPublished - 2021
Event17th International Conference on Computing and Information Technology, IC2IT 2021 - Bangkok, Thailand
Duration: 13 May 202114 May 2021

Publication series

NameLecture Notes in Networks and Systems
Volume251
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference17th International Conference on Computing and Information Technology, IC2IT 2021
Country/TerritoryThailand
CityBangkok
Period13/05/2114/05/21

Keywords

  • Abusive
  • Hate speech
  • Indonesian local languages
  • Machine learning
  • Twitter

Fingerprint

Dive into the research topics of 'Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text'. Together they form a unique fingerprint.

Cite this