Sexual Violence Classification as Hate Speech using Indonesian Tweet

Muammar Notareza Ramadhan, Indra Budi, Aris Budi Santoso, Ryan Randy Suryono

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by sexual violence cases spread through online media. Our research aims to produce the best classification model with high accuracy by comparing several combinations of machine learning techniques. We collected 9, 035 twitter user opinions to be used as a dataset. From a total of 6, 089 opinions that were successfully annotated, 5, 102 opinions were classified as non-hate speech and 987 opinions as hate speech. We purpose SVM model classification with TF-IDF (Unigram) as feature extraction method and Oversampling method such as ROS and SMOTE to solve imbalance dataset problem and improve the performance of model classification. The classification model with SVM algorithm reach the best accuracy, which is 0.942 with F1-score of 0.940.

Original languageEnglish
Title of host publicationProceeding - 2022 International Symposium on Information Technology and Digital Innovation
Subtitle of host publicationTechnology Innovation During Pandemic, ISITDI 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages114-120
Number of pages7
ISBN (Electronic)9781665461160
DOIs
Publication statusPublished - 2022
Event2022 International Symposium on Information Technology and Digital Innovation, ISITDI 2022 - Virtual, Online, Indonesia
Duration: 27 Jul 202228 Jul 2022

Publication series

NameProceeding - 2022 International Symposium on Information Technology and Digital Innovation: Technology Innovation During Pandemic, ISITDI 2022

Conference

Conference2022 International Symposium on Information Technology and Digital Innovation, ISITDI 2022
Country/TerritoryIndonesia
CityVirtual, Online
Period27/07/2228/07/22

Keywords

  • Hate speech
  • machine learning
  • oversampling
  • twitter

Fingerprint

Dive into the research topics of 'Sexual Violence Classification as Hate Speech using Indonesian Tweet'. Together they form a unique fingerprint.

Cite this