Selecting the UD v2 Morphological Features for Indonesian Dependency Treebank

Ika Alfina, Daniel Zeman, Arawinda DInakaramani, Indra Budi, Heru Suhartanto

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

The objectives of our work are to propose the relevant Universal Dependencies (UD) morphological features for Indonesian dependency treebank and to apply the proposed features to an existing treebank. We propose the use of 14 UD v2 features and the corresponding 27 feature-value tags. To evaluate the quality of the resulting treebank, we built models for lemmatization, POS tagging, morphological features analysis, and dependency parsing using UDPipe, a trainable pipeline for tokenization, tagging, lemmatization, and dependency parsing of CoNLL-U files. For lemmatization, POS tagging, and morphological features analysis tasks, the resulting models have F1-score of more than 93% that shows that the consistency of annotations for the columns LEMMA, UPOS, and FEATS in the treebank is already good. However, the accuracy of the Indonesian dependency parser built is still only 82.59% for UAS and 79.83% for LAS. The experiments also show that morphological features information has no or little impact on improving the quality of lemmatization, POS tagging, and dependency parsing models for Indonesian.

Original languageEnglish
Title of host publication2020 International Conference on Asian Language Processing, IALP 2020
EditorsYanfeng Lu, Minghui Dong, Lay-Ki Soon, Keng Hoon Gan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages104-109
Number of pages6
ISBN (Electronic)9781728176895
DOIs
Publication statusPublished - 4 Dec 2020
Event2020 International Conference on Asian Language Processing, IALP 2020 - Kuala Lumpur, Malaysia
Duration: 4 Dec 20206 Dec 2020

Publication series

Name2020 International Conference on Asian Language Processing, IALP 2020

Conference

Conference2020 International Conference on Asian Language Processing, IALP 2020
Country/TerritoryMalaysia
CityKuala Lumpur
Period4/12/206/12/20

Keywords

  • annotation guidelines
  • dependency treebank
  • morphological features
  • Universal Dependencies

Fingerprint

Dive into the research topics of 'Selecting the UD v2 Morphological Features for Indonesian Dependency Treebank'. Together they form a unique fingerprint.

Cite this