TY - GEN
T1 - Analysis and implementation measurement of semantic similarity using content management information on WordNet
AU - Sagala, Tommy Wijaya
AU - Wati, Theresia
AU - Solikin, null
AU - Budi, Nur Fitriah Ayuning
AU - Hidayanto, Achmad Nizar
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - In natural language processing (NLP), measuring semantic similarity plays an important role. The results of these measurements are often used as the basis for performing natural language processing tasks such as question answering, document classification, machine translation, and so on. This paper analyses the test results using the latest dataset on the implementation of content management utilization on WordNet in the form of taxonomy in measuring semantic similarity values. Further implementation results are compared with Gold Standard datasets for measured performance. The dataset used for testing is SimLex-999. In performance measurement, Pearson Correlation and Spearman Correlation are used. The use of these two correlations because each correlation has several advantages and disadvantages. Based on the test results, Seco Formula resulted in Pearson Correlation and Spearman Correlation of 0.583 and 0.582 respectively. While New Formula resulted in Pearson Correlation and Spearman Correlation respectively of 0.602 and 0.594. The correlation results show strong positive correlation relationship. Therefore, the method of information content in WordNet is feasible to be used to measure the value of semantic similarity.
AB - In natural language processing (NLP), measuring semantic similarity plays an important role. The results of these measurements are often used as the basis for performing natural language processing tasks such as question answering, document classification, machine translation, and so on. This paper analyses the test results using the latest dataset on the implementation of content management utilization on WordNet in the form of taxonomy in measuring semantic similarity values. Further implementation results are compared with Gold Standard datasets for measured performance. The dataset used for testing is SimLex-999. In performance measurement, Pearson Correlation and Spearman Correlation are used. The use of these two correlations because each correlation has several advantages and disadvantages. Based on the test results, Seco Formula resulted in Pearson Correlation and Spearman Correlation of 0.583 and 0.582 respectively. While New Formula resulted in Pearson Correlation and Spearman Correlation respectively of 0.602 and 0.594. The correlation results show strong positive correlation relationship. Therefore, the method of information content in WordNet is feasible to be used to measure the value of semantic similarity.
KW - Gold standard
KW - Natural language processing
KW - Pearson correlation
KW - Semantic similarity
KW - Spearman correlation
UR - http://www.scopus.com/inward/record.url?scp=85062385580&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS.2018.8618181
DO - 10.1109/ICACSIS.2018.8618181
M3 - Conference contribution
AN - SCOPUS:85062385580
T3 - 2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018
SP - 337
EP - 342
BT - 2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018
Y2 - 27 October 2018 through 28 October 2018
ER -