Utilizing Translation to Enhance NLP Models in Offensive Language and Hate Speech Identification

Sandy Kurniawan, Indra Budi

Research output: Contribution to journalArticlepeer-review

Abstract

The number of social media users in Indonesia has increased in recent years. The surge in social media users leads to more offensive language on these platforms. The use of offensive language can trigger conflicts between users. Therefore, it is necessary to identify the use of offensive language on social media. This study focused on identifying offensive language, hate speech, and hate speech targets on Twitter. The data used were obtained from previous research on identifying offensive language and hate speech. The amount of data is very influential on the performance of the classification. Therefore, data was added using translation in this study. Classical machine learning (SVM et al.) and deep learning (BiLSTM, CNN, and LSTM) algorithms are used as classification algorithms with word n-gram and word embedding as the features. Three scenarios were done based on the training data used in the classification model development. The result shows that scenario 3, which uses translation for data augmentation, can improve the classification model’s performance by 5%.

Original languageEnglish
Pages (from-to)182-197
JournalJurnal Improsci
Volume1
Issue number4
DOIs
Publication statusPublished - 16 Feb 2024

Keywords

  • Deep Learning
  • Hate Speech
  • Offensive Language
  • Text Classification
  • Twitter

Fingerprint

Dive into the research topics of 'Utilizing Translation to Enhance NLP Models in Offensive Language and Hate Speech Identification'. Together they form a unique fingerprint.

Cite this