Building an Indonesian rule-based part-of-speech tagger

Fam Rashel, Andry Luthfi, Arawinda Dinakaramani, Ruli Manurung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Citations (Scopus)

Abstract

This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Asian Language Processing 2014, IALP 2014
EditorsMinghui Dong, Yanfeng Lu, Rafael E. Banchs, Bali Ranaivo-Malancon
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages70-73
Number of pages4
ISBN (Electronic)9781479953301
DOIs
Publication statusPublished - 3 Dec 2014
EventInternational Conference on Asian Language Processing 2014, IALP 2014 - Kuching, Malaysia
Duration: 20 Oct 201422 Oct 2014

Publication series

NameProceedings of the International Conference on Asian Language Processing 2014, IALP 2014

Conference

ConferenceInternational Conference on Asian Language Processing 2014, IALP 2014
CountryMalaysia
CityKuching
Period20/10/1422/10/14

Keywords

  • disambiguation rule
  • part of speech tag
  • token

Cite this