Developing an online Indonesian corpora repository

Ruli Manurung, Bayu Distiawan, Desmond Darma Putra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

This paper describes efforts to develop an online repository of Indonesian corpora -and its associated functions and services- that has been designed to support a wide variety of use cases and applications. Two design considerations are ensuring sustainability and accessibility of the corpora, and enabling open enrichment through annotation. The presented model supports OLAC-compliant metadata, is built atop an OAIS-compliant core repository, and exposes data and functionality via RESTful web services. A prototype implementation is presented, which allows users to upload, browse, and search the collection, whose extensible content model currently supports POS tagging. The future plan is for language-independent aspects of the system to be packaged up and released as an open-source package to aid the development of corpora repositories for other languages.

Original languageEnglish
Title of host publicationPACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
Pages243-249
Number of pages7
Publication statusPublished - 1 Dec 2010
Event24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24 - Sendai, Japan
Duration: 4 Nov 20107 Nov 2010

Publication series

NamePACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

Conference

Conference24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24
Country/TerritoryJapan
CitySendai
Period4/11/107/11/10

Keywords

  • Annotation
  • Corpora
  • Digital repositories
  • Indonesian
  • Metadata

Fingerprint

Dive into the research topics of 'Developing an online Indonesian corpora repository'. Together they form a unique fingerprint.

Cite this