Building an Indonesian named entity recognizer using Wikipedia and DBPedia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Asian Language Processing 2014, IALP 2014
EditorsMinghui Dong, Yanfeng Lu, Rafael E. Banchs, Bali Ranaivo-Malancon
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages19-22
Number of pages4
ISBN (Electronic)9781479953301
DOIs
Publication statusPublished - 3 Dec 2014
EventInternational Conference on Asian Language Processing 2014, IALP 2014 - Kuching, Malaysia
Duration: 20 Oct 201422 Oct 2014

Publication series

NameProceedings of the International Conference on Asian Language Processing 2014, IALP 2014

Conference

ConferenceInternational Conference on Asian Language Processing 2014, IALP 2014
CountryMalaysia
CityKuching
Period20/10/1422/10/14

Keywords

  • dbpedia
  • name entity recognition
  • stanford ner
  • wikipedia

Fingerprint Dive into the research topics of 'Building an Indonesian named entity recognizer using Wikipedia and DBPedia'. Together they form a unique fingerprint.

Cite this