Building Morphological Analyzer for Informal Text in Indonesian

I. Made Krisna Dwitama, Muhammad Salman Al Farisi, Ika Alfina, Arawinda Dinakaramani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Informal text is heavily used by Indonesian in social media. However, NLP tool that can process such text is still very limited. In this work, we built a morphological analyzer for informal text in Indonesian by adding new rules for informal words to an existing Indonesian morphological analyzer named Aksara. Moreover, we also enrich the Aksara lexicon with informal words. The tool can perform tokenization, lemmatization, and part-of-speech (POS) tagging. Aksara uses a rule-based method using a finite-state transducer with a compiler named Foma. To evaluate the tool, we created a gold standard of 102 sentences with 1434 tokens which around 30 % are informal. The test results show that our tool has a tokenization accuracy of 97.21 %, while lemmatization accuracy for case insensitive is 90.37 %, and POS tagging evaluation reached an F1-Score value of 80%.

Original languageEnglish
Title of host publicationProceedings - ICACSIS 2022
Subtitle of host publication14th International Conference on Advanced Computer Science and Information Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages199-204
Number of pages6
ISBN (Electronic)9781665489362
DOIs
Publication statusPublished - 2022
Event14th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2022 - Virtual, Online, Indonesia
Duration: 1 Oct 20223 Oct 2022

Publication series

NameProceedings - ICACSIS 2022: 14th International Conference on Advanced Computer Science and Information Systems

Conference

Conference14th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2022
Country/TerritoryIndonesia
CityVirtual, Online
Period1/10/223/10/22

Keywords

  • finite-state transducer
  • informal text
  • lemmati-zation
  • morphological analyzer
  • POS tagging
  • tokenization

Fingerprint

Dive into the research topics of 'Building Morphological Analyzer for Informal Text in Indonesian'. Together they form a unique fingerprint.

Cite this