TY - GEN
T1 - A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian
AU - Ridho Ananda, Muhammad
AU - Yudistira Hanifmuti, Muhammad
AU - Alfina, Ika
N1 - Funding Information:
Dissemination of this work is funded by a grant from Program Kompetisi Kampus Merdeka (PKKM) 2021, Faculty of Computer Science, Universitas Indonesia.
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Aksara is an Indonesian NLP tool that conforms to Universal Dependencies annotation guidelines. So far, Aksara can perform four tasks: word segmentation, lemmatization, POS tagging, and morphological features analysis. However, one of its weaknesses is that it has not solved the word sense disambiguation problem. This work’s objective is to build a hybrid of rule-based and Hidden Markov Model (HMM) based POS taggers that utilized the output of Aksara’s rule-based POS tagger and solved the ambiguity problem using HMM and the Viterbi algorithm. We use the bigram and trigram model to train HMM. Our hybrid model is evaluated using a 10-fold cross-validation method and achieves an acceptable result with the trigram model slightly better. Trigram model managed to get 86.62% accuracy and an average F1-score of 82.32%, while the bigram model managed to get 86.47% accuracy and an average F1-score of 81.55%. The experiments also show that the hybrid model of rule-based and HMM-based is better than the HMM-based model alone, with a margin of 2.03% of accuracy.
AB - Aksara is an Indonesian NLP tool that conforms to Universal Dependencies annotation guidelines. So far, Aksara can perform four tasks: word segmentation, lemmatization, POS tagging, and morphological features analysis. However, one of its weaknesses is that it has not solved the word sense disambiguation problem. This work’s objective is to build a hybrid of rule-based and Hidden Markov Model (HMM) based POS taggers that utilized the output of Aksara’s rule-based POS tagger and solved the ambiguity problem using HMM and the Viterbi algorithm. We use the bigram and trigram model to train HMM. Our hybrid model is evaluated using a 10-fold cross-validation method and achieves an acceptable result with the trigram model slightly better. Trigram model managed to get 86.62% accuracy and an average F1-score of 82.32%, while the bigram model managed to get 86.47% accuracy and an average F1-score of 81.55%. The experiments also show that the hybrid model of rule-based and HMM-based is better than the HMM-based model alone, with a margin of 2.03% of accuracy.
KW - Hidden Markov Model
KW - Hybrid model
KW - POS tagging
KW - Universal Dependencies
KW - Viterbi
UR - http://www.scopus.com/inward/record.url?scp=85125187669&partnerID=8YFLogxK
U2 - 10.1109/IALP54817.2021.9675180
DO - 10.1109/IALP54817.2021.9675180
M3 - Conference contribution
AN - SCOPUS:85125187669
T3 - 2021 International Conference on Asian Language Processing, IALP 2021
SP - 280
EP - 285
BT - 2021 International Conference on Asian Language Processing, IALP 2021
A2 - Xiong, Deyi
A2 - Jiang, Ridong
A2 - Lu, Yanfeng
A2 - Dong, Minghui
A2 - Li, Haizhou
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Conference on Asian Language Processing, IALP 2021
Y2 - 11 December 2021 through 13 December 2021
ER -