TY - GEN
T1 - Automatic open domain information extraction from Indonesian text
AU - Gultom, Yohanes
AU - Wibowo, Wahyu Catur
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Availability of vast amount of digital documents that have surpassed human processing capability calls for an automatic information extraction method from any text document regardless of their domain. Unfortunately, open domain information extraction (open IE) systems are language-specific and there is no published system for Indonesian language. This paper introduces a system to extract entity relations from Indonesian text in triple format using an NLP pipeline, rule-based candidates generator, rule-based token expander and machine-learning-based triple selector. We cross-validate four candidates: logistic regression, SVM, MLP, Random Forest using our dataset to discover that Random Forest is the best classifier for the triple selector achieving 0.60 F1 score (0.62 precision and 0.58 recall). The low score is largely due to the simplistic candidate generation rules and the coverage of dataset.
AB - Availability of vast amount of digital documents that have surpassed human processing capability calls for an automatic information extraction method from any text document regardless of their domain. Unfortunately, open domain information extraction (open IE) systems are language-specific and there is no published system for Indonesian language. This paper introduces a system to extract entity relations from Indonesian text in triple format using an NLP pipeline, rule-based candidates generator, rule-based token expander and machine-learning-based triple selector. We cross-validate four candidates: logistic regression, SVM, MLP, Random Forest using our dataset to discover that Random Forest is the best classifier for the triple selector achieving 0.60 F1 score (0.62 precision and 0.58 recall). The low score is largely due to the simplistic candidate generation rules and the coverage of dataset.
UR - http://www.scopus.com/inward/record.url?scp=85050674633&partnerID=8YFLogxK
U2 - 10.1109/IWBIS.2017.8275098
DO - 10.1109/IWBIS.2017.8275098
M3 - Conference contribution
AN - SCOPUS:85050674633
T3 - Proceedings - WBIS 2017: 2017 International Workshop on Big Data and Information Security
SP - 23
EP - 30
BT - Proceedings - WBIS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 International Workshop on Big Data and Information Security, WBIS 2017
Y2 - 23 September 2017 through 24 September 2017
ER -