TY - GEN
T1 - DBpedia entities expansion in automatically building dataset for Indonesian NER
AU - Alfina, Ika
AU - Manurung, Ruli
AU - Fanany, Mohamad Ivan
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/3/6
Y1 - 2017/3/6
N2 - Named Entity Recognition (NER) plays a significant role in Information Extraction (IE). In English, the NER systems have achieved excellent performance, but for the Indonesian language, the systems still need a lot of improvement. To create a reliable NER system using machine learning approach, a massive dataset to train the classifier is a must. Several studies have proposed methods in automatically building dataset for Indonesian NER using Indonesian Wikipedia articles as the source of the dataset and DBpedia as the reference in determining entity types automatically. The objective of our research is to improve the quality of the automatically tagged dataset. We proposed a new method in using DBpedia as the referenced named entities. We have created some rules in expanding DBpedia entities corpus for category person, place, and organization. The resulting training dataset is trained using Stanford NER tool to build an Indonesian NER classifier. The evaluation shows that our method improves recall significantly but has lower precision compared to the previous research.
AB - Named Entity Recognition (NER) plays a significant role in Information Extraction (IE). In English, the NER systems have achieved excellent performance, but for the Indonesian language, the systems still need a lot of improvement. To create a reliable NER system using machine learning approach, a massive dataset to train the classifier is a must. Several studies have proposed methods in automatically building dataset for Indonesian NER using Indonesian Wikipedia articles as the source of the dataset and DBpedia as the reference in determining entity types automatically. The objective of our research is to improve the quality of the automatically tagged dataset. We proposed a new method in using DBpedia as the referenced named entities. We have created some rules in expanding DBpedia entities corpus for category person, place, and organization. The resulting training dataset is trained using Stanford NER tool to build an Indonesian NER classifier. The evaluation shows that our method improves recall significantly but has lower precision compared to the previous research.
KW - DBpedia
KW - NER
KW - automatic tagging
UR - http://www.scopus.com/inward/record.url?scp=85016954805&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS.2016.7872784
DO - 10.1109/ICACSIS.2016.7872784
M3 - Conference contribution
AN - SCOPUS:85016954805
T3 - 2016 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016
SP - 335
EP - 340
BT - 2016 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016
Y2 - 15 October 2016 through 16 October 2016
ER -