TY - GEN
T1 - Building an Indonesian named entity recognizer using Wikipedia and DBPedia
AU - Luthfi, Andry
AU - Distiawan, Bayu
AU - Manurung, Ruli
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/3
Y1 - 2014/12/3
N2 - This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.
AB - This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.
KW - dbpedia
KW - name entity recognition
KW - stanford ner
KW - wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84941106249&partnerID=8YFLogxK
U2 - 10.1109/IALP.2014.6973520
DO - 10.1109/IALP.2014.6973520
M3 - Conference contribution
AN - SCOPUS:84941106249
T3 - Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
SP - 19
EP - 22
BT - Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
A2 - Dong, Minghui
A2 - Lu, Yanfeng
A2 - Banchs, Rafael E.
A2 - Ranaivo-Malancon, Bali
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Conference on Asian Language Processing 2014, IALP 2014
Y2 - 20 October 2014 through 22 October 2014
ER -