TY - GEN
T1 - Selecting the UD v2 Morphological Features for Indonesian Dependency Treebank
AU - Alfina, Ika
AU - Zeman, Daniel
AU - DInakaramani, Arawinda
AU - Budi, Indra
AU - Suhartanto, Heru
N1 - Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2020/12/4
Y1 - 2020/12/4
N2 - The objectives of our work are to propose the relevant Universal Dependencies (UD) morphological features for Indonesian dependency treebank and to apply the proposed features to an existing treebank. We propose the use of 14 UD v2 features and the corresponding 27 feature-value tags. To evaluate the quality of the resulting treebank, we built models for lemmatization, POS tagging, morphological features analysis, and dependency parsing using UDPipe, a trainable pipeline for tokenization, tagging, lemmatization, and dependency parsing of CoNLL-U files. For lemmatization, POS tagging, and morphological features analysis tasks, the resulting models have F1-score of more than 93% that shows that the consistency of annotations for the columns LEMMA, UPOS, and FEATS in the treebank is already good. However, the accuracy of the Indonesian dependency parser built is still only 82.59% for UAS and 79.83% for LAS. The experiments also show that morphological features information has no or little impact on improving the quality of lemmatization, POS tagging, and dependency parsing models for Indonesian.
AB - The objectives of our work are to propose the relevant Universal Dependencies (UD) morphological features for Indonesian dependency treebank and to apply the proposed features to an existing treebank. We propose the use of 14 UD v2 features and the corresponding 27 feature-value tags. To evaluate the quality of the resulting treebank, we built models for lemmatization, POS tagging, morphological features analysis, and dependency parsing using UDPipe, a trainable pipeline for tokenization, tagging, lemmatization, and dependency parsing of CoNLL-U files. For lemmatization, POS tagging, and morphological features analysis tasks, the resulting models have F1-score of more than 93% that shows that the consistency of annotations for the columns LEMMA, UPOS, and FEATS in the treebank is already good. However, the accuracy of the Indonesian dependency parser built is still only 82.59% for UAS and 79.83% for LAS. The experiments also show that morphological features information has no or little impact on improving the quality of lemmatization, POS tagging, and dependency parsing models for Indonesian.
KW - annotation guidelines
KW - dependency treebank
KW - morphological features
KW - Universal Dependencies
UR - http://www.scopus.com/inward/record.url?scp=85097671510&partnerID=8YFLogxK
U2 - 10.1109/IALP51396.2020.9310513
DO - 10.1109/IALP51396.2020.9310513
M3 - Conference contribution
AN - SCOPUS:85097671510
T3 - 2020 International Conference on Asian Language Processing, IALP 2020
SP - 104
EP - 109
BT - 2020 International Conference on Asian Language Processing, IALP 2020
A2 - Lu, Yanfeng
A2 - Dong, Minghui
A2 - Soon, Lay-Ki
A2 - Gan, Keng Hoon
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 International Conference on Asian Language Processing, IALP 2020
Y2 - 4 December 2020 through 6 December 2020
ER -