Resources for syntactic parsing for Indonesian are very limited, as there are only two dependency treebanks publicly available and both are small in size. Not only that, we found out that the word segmentation method used by both treebanks needs improvement. Therefore, in this work we proposed a revision for one of these treebanks, Indonesian Parallel Universal Dependencies treebank. Besides improving word segmentation, we also improved POS tagging and syntactic annotations. Because in Indonesian grammar there are some special structures, we also proposed how to adjust UDv2 annotation guidelines with those Indonesian grammar rules. To evaluate the quality of the new treebank, we built Indonesian dependency parser model using Parsito (UDPipe) parser. Using ten-fold cross-validation, the model that built using the revised treebank had UAS of 83.33% and LAS of 79.39%, over the original treebank with UAS of 73.32% and LAS of 65.98%.
|Number of pages||9|
|Publication status||Published - 1 Jan 2019|
|Event||33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019 - Hakodate, Japan|
Duration: 13 Sept 2019 → 15 Sept 2019
|Conference||33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019|
|Period||13/09/19 → 15/09/19|