A gold standard dependency treebank for Indonesian

Ika Alfina, Arawinda Dinakaramani, Mohamad Ivan Fanany, Heru Suhartanto

Research output: Contribution to conferencePaperpeer-review

9 Citations (Scopus)


Resources for syntactic parsing for Indonesian are very limited, as there are only two dependency treebanks publicly available and both are small in size. Not only that, we found out that the word segmentation method used by both treebanks needs improvement. Therefore, in this work we proposed a revision for one of these treebanks, Indonesian Parallel Universal Dependencies treebank. Besides improving word segmentation, we also improved POS tagging and syntactic annotations. Because in Indonesian grammar there are some special structures, we also proposed how to adjust UDv2 annotation guidelines with those Indonesian grammar rules. To evaluate the quality of the new treebank, we built Indonesian dependency parser model using Parsito (UDPipe) parser. Using ten-fold cross-validation, the model that built using the revised treebank had UAS of 83.33% and LAS of 79.39%, over the original treebank with UAS of 73.32% and LAS of 65.98%.

Original languageEnglish
Number of pages9
Publication statusPublished - 1 Jan 2019
Event33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019 - Hakodate, Japan
Duration: 13 Sept 201915 Sept 2019


Conference33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019


Dive into the research topics of 'A gold standard dependency treebank for Indonesian'. Together they form a unique fingerprint.

Cite this