TY - GEN
T1 - Aksara
T2 - 2020 International Conference on Asian Language Processing, IALP 2020
AU - Hanifmuti, Muhammad Yudistira
AU - Alfina, Ika
N1 - Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2020/12/4
Y1 - 2020/12/4
N2 - The objective of this work is to build an Indonesian morphological analyzer named Aksara that conforms to the Universal Dependencies (UD), especially UD v2. Many works had developed Indonesian morphological analyzer, but as far as we know none conforms to the UD annotation guidelines. In building Aksara we use the same approach with MorphInd, another Indonesian morphological analyzer, that uses finite state compiler named Foma. Aksara has capability to perform four tasks: 1) word segmentation, 2) lemmatization, 3) POS tagging, and 4) morphological features analysis. To evaluate the quality of this tool, we used an Indonesian dependency treebank that conforms to UD v2 as the gold standard. We also compare the performance measures of Aksara with MorphInd, by mapping MorphInd output to CoNNL-U format. The experiment results show that for all the four tasks Aksara outperforms MorphInd. For word segmentation task, Aksara has accuracy of 96.9%, for lemmatization with case-sensitive it has accuracy of 94.83%, for POS tagging it has F1-score of 88.2% and finally for morphological features analysis, among 18 feature-value tags already implemented, nine tags already have F1-score more than 80%.
AB - The objective of this work is to build an Indonesian morphological analyzer named Aksara that conforms to the Universal Dependencies (UD), especially UD v2. Many works had developed Indonesian morphological analyzer, but as far as we know none conforms to the UD annotation guidelines. In building Aksara we use the same approach with MorphInd, another Indonesian morphological analyzer, that uses finite state compiler named Foma. Aksara has capability to perform four tasks: 1) word segmentation, 2) lemmatization, 3) POS tagging, and 4) morphological features analysis. To evaluate the quality of this tool, we used an Indonesian dependency treebank that conforms to UD v2 as the gold standard. We also compare the performance measures of Aksara with MorphInd, by mapping MorphInd output to CoNNL-U format. The experiment results show that for all the four tasks Aksara outperforms MorphInd. For word segmentation task, Aksara has accuracy of 96.9%, for lemmatization with case-sensitive it has accuracy of 94.83%, for POS tagging it has F1-score of 88.2% and finally for morphological features analysis, among 18 feature-value tags already implemented, nine tags already have F1-score more than 80%.
KW - lemmatization
KW - morphological analyzer
KW - POS tagging
KW - Universal Dependencies
KW - word segmentation
UR - http://www.scopus.com/inward/record.url?scp=85099879161&partnerID=8YFLogxK
U2 - 10.1109/IALP51396.2020.9310490
DO - 10.1109/IALP51396.2020.9310490
M3 - Conference contribution
AN - SCOPUS:85099879161
T3 - 2020 International Conference on Asian Language Processing, IALP 2020
SP - 86
EP - 91
BT - 2020 International Conference on Asian Language Processing, IALP 2020
A2 - Lu, Yanfeng
A2 - Dong, Minghui
A2 - Soon, Lay-Ki
A2 - Gan, Keng Hoon
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 December 2020 through 6 December 2020
ER -