TY - GEN
T1 - The accuracy of fuzzy C-means in lower-dimensional space for topic detection
AU - Murfi, Hendri
N1 - Funding Information:
Acknowledgment. This work was supported by Universitas Indonesia under PDUPT 2018 grant. Any opinions, findings, and conclusions or recommendations are the authors’ and do not necessarily reflect those of the sponsor.
Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Topic detection is an automatic method to discover topics in textual data. The standard methods of the topic detection are nonnegative matrix factorization (NMF) and latent Dirichlet allocation (LDA). Another alternative method is a clustering approach such as a k-means and fuzzy c-means (FCM). FCM extend the k-means method in the sense that the textual data may have more than one topic. However, FCM works well for low-dimensional textual data and fails for high-dimensional textual data. An approach to overcome the problem is transforming the textual data into lower dimensional space, i.e., Eigenspace, and called Eigenspace-based FCM (EFCM). Firstly, the textual data are transformed into an Eigenspace using truncated singular value decomposition. FCM is performed on the eigenspace data to identify the memberships of the textual data in clusters. Using these memberships, we generate topics from the high dimensional textual data in the original space. In this paper, we examine the accuracy of EFCM for topic detection. Our simulations show that EFCM results in the accuracies between the accuracies of LDA and NMF regarding both topic interpretation and topic recall.
AB - Topic detection is an automatic method to discover topics in textual data. The standard methods of the topic detection are nonnegative matrix factorization (NMF) and latent Dirichlet allocation (LDA). Another alternative method is a clustering approach such as a k-means and fuzzy c-means (FCM). FCM extend the k-means method in the sense that the textual data may have more than one topic. However, FCM works well for low-dimensional textual data and fails for high-dimensional textual data. An approach to overcome the problem is transforming the textual data into lower dimensional space, i.e., Eigenspace, and called Eigenspace-based FCM (EFCM). Firstly, the textual data are transformed into an Eigenspace using truncated singular value decomposition. FCM is performed on the eigenspace data to identify the memberships of the textual data in clusters. Using these memberships, we generate topics from the high dimensional textual data in the original space. In this paper, we examine the accuracy of EFCM for topic detection. Our simulations show that EFCM results in the accuracies between the accuracies of LDA and NMF regarding both topic interpretation and topic recall.
KW - Accuracy
KW - Clustering
KW - Eigenspace
KW - Fuzzy c-means
KW - Topic detection
UR - http://www.scopus.com/inward/record.url?scp=85058569561&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-05755-8_32
DO - 10.1007/978-3-030-05755-8_32
M3 - Conference contribution
AN - SCOPUS:85058569561
SN - 9783030057541
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 321
EP - 334
BT - Smart Computing and Communication - 3rd International Conference, SmartCom 2018, Proceedings
A2 - Qiu, Meikang
PB - Springer Verlag
T2 - 3rd International Conference on Smart Computing and Communications, SmartCom 2018
Y2 - 10 December 2018 through 12 December 2018
ER -