Hate speech detection on Indonesian long text documents using machine learning approach

Nofa Aulia, Indra Budi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

Due to the growth of hate speech on social media in recent years, it is important to understand this issue. An automatic hate speech detection system is needed to help to counter this problem. There have been many studies on detecting hate speech in short documents like Twitter data. But to our knowledge, research on long documents is rare, we suppose that the difficulty is increasing due to the possibility of the message of the text may be hidden. In this research, we explore in detecting hate speech on Indonesian long documents using machine learning approach. We build a new Indonesian hate speech dataset from Facebook. The experiment showed that the best performance obtained by Support Vector Machine (SVM) as its classifier algorithm using TF-IDF, char quad-gram, word unigram, and lexicon features that yield f1-score of 85%.

Original languageEnglish
Title of host publicationICCAI 2019 - 2019 5th International Conference on Computing and Artificial Intelligence
PublisherAssociation for Computing Machinery
Pages164-169
Number of pages6
ISBN (Electronic)9781450361064
DOIs
Publication statusPublished - 19 Apr 2019
Event5th International Conference on Computing and Artificial Intelligence, ICCAI 2019 - Bali, Indonesia
Duration: 19 Apr 201922 Apr 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th International Conference on Computing and Artificial Intelligence, ICCAI 2019
Country/TerritoryIndonesia
CityBali
Period19/04/1922/04/19

Keywords

  • Hate speech detection
  • Long documents
  • Machine learning
  • SVM

Fingerprint

Dive into the research topics of 'Hate speech detection on Indonesian long text documents using machine learning approach'. Together they form a unique fingerprint.

Cite this