DBpedia entities expansion in automatically building dataset for Indonesian NER

Ika Alfina, Ruli Manurung, Mohamad Ivan Fanany

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Citations (Scopus)

Abstract

Named Entity Recognition (NER) plays a significant role in Information Extraction (IE). In English, the NER systems have achieved excellent performance, but for the Indonesian language, the systems still need a lot of improvement. To create a reliable NER system using machine learning approach, a massive dataset to train the classifier is a must. Several studies have proposed methods in automatically building dataset for Indonesian NER using Indonesian Wikipedia articles as the source of the dataset and DBpedia as the reference in determining entity types automatically. The objective of our research is to improve the quality of the automatically tagged dataset. We proposed a new method in using DBpedia as the referenced named entities. We have created some rules in expanding DBpedia entities corpus for category person, place, and organization. The resulting training dataset is trained using Stanford NER tool to build an Indonesian NER classifier. The evaluation shows that our method improves recall significantly but has lower precision compared to the previous research.

Original languageEnglish
Title of host publication2016 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages335-340
Number of pages6
ISBN (Electronic)9781509046294
DOIs
Publication statusPublished - 6 Mar 2017
Event8th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016 - Malang, Indonesia
Duration: 15 Oct 201616 Oct 2016

Publication series

Name2016 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016

Conference

Conference8th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016
Country/TerritoryIndonesia
CityMalang
Period15/10/1616/10/16

Keywords

  • DBpedia
  • NER
  • automatic tagging

Fingerprint

Dive into the research topics of 'DBpedia entities expansion in automatically building dataset for Indonesian NER'. Together they form a unique fingerprint.

Cite this