A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian

Muhammad Ridho Ananda, Muhammad Yudistira Hanifmuti, Ika Alfina

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Aksara is an Indonesian NLP tool that conforms to Universal Dependencies annotation guidelines. So far, Aksara can perform four tasks: word segmentation, lemmatization, POS tagging, and morphological features analysis. However, one of its weaknesses is that it has not solved the word sense disambiguation problem. This work’s objective is to build a hybrid of rule-based and Hidden Markov Model (HMM) based POS taggers that utilized the output of Aksara’s rule-based POS tagger and solved the ambiguity problem using HMM and the Viterbi algorithm. We use the bigram and trigram model to train HMM. Our hybrid model is evaluated using a 10-fold cross-validation method and achieves an acceptable result with the trigram model slightly better. Trigram model managed to get 86.62% accuracy and an average F1-score of 82.32%, while the bigram model managed to get 86.47% accuracy and an average F1-score of 81.55%. The experiments also show that the hybrid model of rule-based and HMM-based is better than the HMM-based model alone, with a margin of 2.03% of accuracy.

Original languageEnglish
Title of host publication2021 International Conference on Asian Language Processing, IALP 2021
EditorsDeyi Xiong, Ridong Jiang, Yanfeng Lu, Minghui Dong, Haizhou Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages280-285
Number of pages6
ISBN (Electronic)9781665483117
DOIs
Publication statusPublished - 2021
Event2021 International Conference on Asian Language Processing, IALP 2021 - Singapore, Singapore
Duration: 11 Dec 202113 Dec 2021

Publication series

Name2021 International Conference on Asian Language Processing, IALP 2021

Conference

Conference2021 International Conference on Asian Language Processing, IALP 2021
Country/TerritorySingapore
CitySingapore
Period11/12/2113/12/21

Keywords

  • Hidden Markov Model
  • Hybrid model
  • POS tagging
  • Universal Dependencies
  • Viterbi

Fingerprint

Dive into the research topics of 'A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian'. Together they form a unique fingerprint.

Cite this