Named entity recognition on Indonesian microblog messages

Natanael Taufik, Alfan F. Wicaksono, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Citations (Scopus)

Abstract

This paper describes a model to address the task of named-entity recognition on Indonesian microblog messages due to its usefulness for higher-level tasks or text mining applications on Indonesian microblogs. We view our task as a sequence labeling problem using machine learning approach. We also propose various word-level and orthographic features, including the ones that are specific to the Indonesian language. Finally, in our experiment, we compared our model with a baseline model previously proposed for Indonesian formal documents, instead of microblog messages. Our contribution is two-fold: (1) we developed NER tool for Indonesian microblog messages, which was never addressed before, (2) we developed NER corpus containing around 600 Indonesian microblog messages available for future development.

Original languageEnglish
Title of host publicationProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016
EditorsMinghui Dong, Chung-Hsien Wu, Yanfeng Lu, Haizhou Li, Yuen-Hsien Tseng, Liang-Chih Yu, Lung-Hao Lee
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages358-361
Number of pages4
ISBN (Electronic)9781509009213
DOIs
Publication statusPublished - 10 Mar 2017
Event20th International Conference on Asian Language Processing, IALP 2016 - Tainan, Taiwan, Province of China
Duration: 21 Nov 201623 Nov 2016

Publication series

NameProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016

Conference

Conference20th International Conference on Asian Language Processing, IALP 2016
Country/TerritoryTaiwan, Province of China
CityTainan
Period21/11/1623/11/16

Fingerprint

Dive into the research topics of 'Named entity recognition on Indonesian microblog messages'. Together they form a unique fingerprint.

Cite this