Abusive language and hate speech detection for Javanese and Sundanese languages in tweets: Dataset and preliminary study

Shofianina Dwi Ananda Putri, Muhammad Okky Ibrohim, Indra Budi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Indonesia’s demography as an archipelago with lots of tribes and local languages added variances in their communication style. Every region in Indonesia has its own distinct culture, accents, and languages. The demographical condition can influence the characteristic of the language used in social media, such as Twitter. It can be found that Indonesian uses their own local language for communicating and expressing their mind in tweets. Nowadays, research about identifying hate speech and abusive language has become an attractive and developing topic. Moreover, the research related to Indonesian local languages still rarely encountered. This paper analyzes the use of machine learning approaches such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) in detecting hate speech and abusive language in Sundanese and Javanese as Indonesian local languages. The classifiers were used with the several term weightings features, such as word n-grams and char n-grams. The experiments are evaluated using the F-measure. It achieves over 60 % for both local languages.

Original languageEnglish
Title of host publication2021 11th International Workshop on Computer Science and Engineering, WCSE 2021
PublisherInternational Workshop on Computer Science and Engineering (WCSE)
Pages461-465
Number of pages5
ISBN (Electronic)9789811817915
DOIs
Publication statusPublished - 2021
Event2021 11th International Workshop on Computer Science and Engineering, WCSE 2021 - Shanghai, Virtual, China
Duration: 19 Jun 202121 Jun 2021

Publication series

Name2021 11th International Workshop on Computer Science and Engineering, WCSE 2021

Conference

Conference2021 11th International Workshop on Computer Science and Engineering, WCSE 2021
Country/TerritoryChina
CityShanghai, Virtual
Period19/06/2121/06/21

Keywords

  • Abusive
  • Hate speech
  • Indonesian local language
  • Javanese
  • Sundanese
  • Twitter

Fingerprint

Dive into the research topics of 'Abusive language and hate speech detection for Javanese and Sundanese languages in tweets: Dataset and preliminary study'. Together they form a unique fingerprint.

Cite this