Modified DBpedia entities expansion for tagging automatically NER dataset

Ika Alfina, Septiviana Savitri, Mohamad Ivan Fanany

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Developing NER system using machine learning approach needs a big dataset which is costly if the dataset labeling is done manually. The previous works proposed methods in tagging automatically the Indonesian NER dataset using Wikipedia articles as the source of the dataset and DBpedia as the reference of the entity type. However, the quality of the resulting dataset was still inadequate. A method named DBpedia Entities Expansion (DEE) had introduced several rules to expand named entities in DBpedia in order to improve recall, but it had not managed to remove noise that makes precision decline, especially for person names. The objective of this research is to propose the modification to DEE method with the main focus to remove invalid names from the list of person names in the Expanded DBpedia. We call this modification as Modified DEE (M-DEE). The evaluation shows that M-DEE can improve the precision for person names around 3% compared to the original DEE. By adding gazetteers for place and organization names into the Expanded DBpedia created by M-DEE, the margin about 10% of the overall F1-score for all types was achieved.

Original languageEnglish
Title of host publication2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages216-221
Number of pages6
ISBN (Electronic)9781538631720
DOIs
Publication statusPublished - 4 May 2018
Event9th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017 - Jakarta, Indonesia
Duration: 28 Oct 201729 Oct 2017

Publication series

Name2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
Volume2018-January

Conference

Conference9th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
CountryIndonesia
CityJakarta
Period28/10/1729/10/17

Keywords

  • building dataset
  • DBpedia
  • NER
  • noise reduction

Fingerprint Dive into the research topics of 'Modified DBpedia entities expansion for tagging automatically NER dataset'. Together they form a unique fingerprint.

Cite this