A gold standard dependency treebank for Indonesian

Research output: Contribution to conferencePaperpeer-review

5 Citations (Scopus)

Abstract

Resources for syntactic parsing for Indonesian are very limited, as there are only two dependency treebanks publicly available and both are small in size. Not only that, we found out that the word segmentation method used by both treebanks needs improvement. Therefore, in this work we proposed a revision for one of these treebanks, Indonesian Parallel Universal Dependencies treebank. Besides improving word segmentation, we also improved POS tagging and syntactic annotations. Because in Indonesian grammar there are some special structures, we also proposed how to adjust UDv2 annotation guidelines with those Indonesian grammar rules. To evaluate the quality of the new treebank, we built Indonesian dependency parser model using Parsito (UDPipe) parser. Using ten-fold cross-validation, the model that built using the revised treebank had UAS of 83.33% and LAS of 79.39%, over the original treebank with UAS of 73.32% and LAS of 65.98%.

Original languageEnglish
Pages1-9
Number of pages9
Publication statusPublished - 1 Jan 2019
Event33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019 - Hakodate, Japan
Duration: 13 Sep 201915 Sep 2019

Conference

Conference33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019
Country/TerritoryJapan
CityHakodate
Period13/09/1915/09/19

Fingerprint

Dive into the research topics of 'A gold standard dependency treebank for Indonesian'. Together they form a unique fingerprint.

Cite this