Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus

Arawinda Dinakaramani, Fam Rashel, Andry Luthfi, Ruli Manurung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

70 Citations (Scopus)

Abstract

We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Asian Language Processing 2014, IALP 2014
EditorsMinghui Dong, Yanfeng Lu, Rafael E. Banchs, Bali Ranaivo-Malancon
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages66-69
Number of pages4
ISBN (Electronic)9781479953301
DOIs
Publication statusPublished - 3 Dec 2014
EventInternational Conference on Asian Language Processing 2014, IALP 2014 - Kuching, Malaysia
Duration: 20 Oct 201422 Oct 2014

Publication series

NameProceedings of the International Conference on Asian Language Processing 2014, IALP 2014

Conference

ConferenceInternational Conference on Asian Language Processing 2014, IALP 2014
Country/TerritoryMalaysia
CityKuching
Period20/10/1422/10/14

Keywords

  • Indonesian
  • Part of speech tagset
  • POS

Fingerprint

Dive into the research topics of 'Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus'. Together they form a unique fingerprint.

Cite this