Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format

Jessica Naraiswari Arwidarasti, Ika Alfina, Adila Alfa Krisnadhi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Multiword Expression (MWE) has been a pain in the neck, especially in determining its word-classes in syntactic treebank. Previous work had proposed annotation guidelines for Indonesian MWEs that align to the Penn Treebank (PTB) format. However, we think that their proposed annotation still needs improvements. Therefore, this study proposes a new annotation guideline in labeling Indonesian MWE that conforms to PTB format. Moreover, we also revised the MWE annotation of an existing Indonesian constituency treebank consisting of 1030 sentences to conform to the new guidelines. To evaluate the revised treebank's quality, we built an Indonesian constituency parser model using the revised treebank and Stanford parser. The experiments show that the resulting parser has an F1-score of 69.97%.

Original languageEnglish
Title of host publication2020 International Conference on Asian Language Processing, IALP 2020
EditorsYanfeng Lu, Minghui Dong, Lay-Ki Soon, Keng Hoon Gan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages75-80
Number of pages6
ISBN (Electronic)9781728176895
DOIs
Publication statusPublished - 4 Dec 2020
Event2020 International Conference on Asian Language Processing, IALP 2020 - Kuala Lumpur, Malaysia
Duration: 4 Dec 20206 Dec 2020

Publication series

Name2020 International Conference on Asian Language Processing, IALP 2020

Conference

Conference2020 International Conference on Asian Language Processing, IALP 2020
Country/TerritoryMalaysia
CityKuala Lumpur
Period4/12/206/12/20

Keywords

  • compound word
  • Indonesian
  • multiword expression
  • Penn Treebank
  • Stanford parser

Fingerprint

Dive into the research topics of 'Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format'. Together they form a unique fingerprint.

Cite this