Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification

Evi Yulianti, Ajmal Kurnia, Mirna Adriani, Yoppy Setyo Duto

Research output: Contribution to journalArticlepeer-review

Abstract

Usage of code-mixed text has increased in recent years among Indonesian internet users, who often mix Indonesian-language with English-language text. Normalisation of this code-mixed text into Indonesian needs to be performed to capture the meaning of English parts of the text and process them effectively. We improve a state-of-the-art code-mixed Indonesian-English normalisation system by modifying its pipeline modules. We further analyse the effect of code-mixed normalisation on emotion classification tasks. Our approach significantly improved on a state-of-the-art Indonesian-English code-mixed text normalisation system in both the individual pipeline modules and the overall system. The new feature set in the language identification module showed an improvement of 4.26% in terms of F1 score. The combination of machine translation and ruleset in the lexical normalisation module improved BLEU score by 25.22% and lowered WER by 62.49%. The use of context in the translation module improved BLEU score by 2.5% and lowered WER by 8.84%. The effectiveness of the overall pipeline normalisation system increased by 32.11% and 33.82%, in terms of BLEU score and WER, respectively. Code-mixed normalisation also improved the accuracy of emotion classification by up to 37.74% in terms of F1 score.

Original languageEnglish
Pages (from-to)674-685
Number of pages12
JournalInternational Journal of Advanced Computer Science and Applications
Volume12
Issue number11
DOIs
Publication statusPublished - 2021

Keywords

  • Code-mixed normalisation
  • emotion classification
  • English
  • Indonesian

Fingerprint

Dive into the research topics of 'Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification'. Together they form a unique fingerprint.

Cite this