Aksara: An Indonesian Morphological Analyzer that Conforms to the UD v2 Annotation Guidelines

Muhammad Yudistira Hanifmuti, Ika Alfina

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

The objective of this work is to build an Indonesian morphological analyzer named Aksara that conforms to the Universal Dependencies (UD), especially UD v2. Many works had developed Indonesian morphological analyzer, but as far as we know none conforms to the UD annotation guidelines. In building Aksara we use the same approach with MorphInd, another Indonesian morphological analyzer, that uses finite state compiler named Foma. Aksara has capability to perform four tasks: 1) word segmentation, 2) lemmatization, 3) POS tagging, and 4) morphological features analysis. To evaluate the quality of this tool, we used an Indonesian dependency treebank that conforms to UD v2 as the gold standard. We also compare the performance measures of Aksara with MorphInd, by mapping MorphInd output to CoNNL-U format. The experiment results show that for all the four tasks Aksara outperforms MorphInd. For word segmentation task, Aksara has accuracy of 96.9%, for lemmatization with case-sensitive it has accuracy of 94.83%, for POS tagging it has F1-score of 88.2% and finally for morphological features analysis, among 18 feature-value tags already implemented, nine tags already have F1-score more than 80%.

Original languageEnglish
Title of host publication2020 International Conference on Asian Language Processing, IALP 2020
EditorsYanfeng Lu, Minghui Dong, Lay-Ki Soon, Keng Hoon Gan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages86-91
Number of pages6
ISBN (Electronic)9781728176895
DOIs
Publication statusPublished - 4 Dec 2020
Event2020 International Conference on Asian Language Processing, IALP 2020 - Kuala Lumpur, Malaysia
Duration: 4 Dec 20206 Dec 2020

Publication series

Name2020 International Conference on Asian Language Processing, IALP 2020

Conference

Conference2020 International Conference on Asian Language Processing, IALP 2020
Country/TerritoryMalaysia
CityKuala Lumpur
Period4/12/206/12/20

Keywords

  • lemmatization
  • morphological analyzer
  • POS tagging
  • Universal Dependencies
  • word segmentation

Fingerprint

Dive into the research topics of 'Aksara: An Indonesian Morphological Analyzer that Conforms to the UD v2 Annotation Guidelines'. Together they form a unique fingerprint.

Cite this