TY - GEN
T1 - Developing an online Indonesian corpora repository
AU - Manurung, Ruli
AU - Distiawan, Bayu
AU - Putra, Desmond Darma
PY - 2010/12/1
Y1 - 2010/12/1
N2 - This paper describes efforts to develop an online repository of Indonesian corpora -and its associated functions and services- that has been designed to support a wide variety of use cases and applications. Two design considerations are ensuring sustainability and accessibility of the corpora, and enabling open enrichment through annotation. The presented model supports OLAC-compliant metadata, is built atop an OAIS-compliant core repository, and exposes data and functionality via RESTful web services. A prototype implementation is presented, which allows users to upload, browse, and search the collection, whose extensible content model currently supports POS tagging. The future plan is for language-independent aspects of the system to be packaged up and released as an open-source package to aid the development of corpora repositories for other languages.
AB - This paper describes efforts to develop an online repository of Indonesian corpora -and its associated functions and services- that has been designed to support a wide variety of use cases and applications. Two design considerations are ensuring sustainability and accessibility of the corpora, and enabling open enrichment through annotation. The presented model supports OLAC-compliant metadata, is built atop an OAIS-compliant core repository, and exposes data and functionality via RESTful web services. A prototype implementation is presented, which allows users to upload, browse, and search the collection, whose extensible content model currently supports POS tagging. The future plan is for language-independent aspects of the system to be packaged up and released as an open-source package to aid the development of corpora repositories for other languages.
KW - Annotation
KW - Corpora
KW - Digital repositories
KW - Indonesian
KW - Metadata
UR - http://www.scopus.com/inward/record.url?scp=84863860337&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84863860337
SN - 9784905166009
T3 - PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
SP - 243
EP - 249
BT - PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
T2 - 24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24
Y2 - 4 November 2010 through 7 November 2010
ER -