TY - GEN
T1 - Accuracy of separable nonnegative matrix factorization for topic extraction
AU - Murfi, Hendri
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/11/24
Y1 - 2017/11/24
N2 - Topic extraction is an automatic method to extract topics in textual data. The popular method of topic extraction is latent Dirichlet allocation (LDA) which is a probabilistic topic model. Because of some limitations of learning the model parameters, e.g. NP-hard, several researchers continue the work to design methods with polynomial complexities. The developing alternative approach is the nonnegative matrix factorization (NMF) based method. Under a separability assumption, a direct method that runs in polynomial time is proposed. In general, this algorithm works in three steps: first, generating a word cooccurrence matrix, choosing anchor words for each topic, and then in the recovery step, it directly reconstructs the topics given the anchor words. In this paper, we examine the accuracy of the separable nonnegative matrix factorization (SNMF). Firstly the accuracy of SNMF is strongly influenced by the anchor words. In this case, the accuracy of SNMF is significantly improved when we find the anchr words in Eigenspace, instead of random space. Moreover, SNMF gives the higher accuracy than LDA, however, the lower accuracy than NMF.
AB - Topic extraction is an automatic method to extract topics in textual data. The popular method of topic extraction is latent Dirichlet allocation (LDA) which is a probabilistic topic model. Because of some limitations of learning the model parameters, e.g. NP-hard, several researchers continue the work to design methods with polynomial complexities. The developing alternative approach is the nonnegative matrix factorization (NMF) based method. Under a separability assumption, a direct method that runs in polynomial time is proposed. In general, this algorithm works in three steps: first, generating a word cooccurrence matrix, choosing anchor words for each topic, and then in the recovery step, it directly reconstructs the topics given the anchor words. In this paper, we examine the accuracy of the separable nonnegative matrix factorization (SNMF). Firstly the accuracy of SNMF is strongly influenced by the anchor words. In this case, the accuracy of SNMF is significantly improved when we find the anchr words in Eigenspace, instead of random space. Moreover, SNMF gives the higher accuracy than LDA, however, the lower accuracy than NMF.
KW - Eigenspace
KW - Separable nonnegative matrix factorization
KW - Singular value decomposition
KW - Topic extraction
UR - http://www.scopus.com/inward/record.url?scp=85042076307&partnerID=8YFLogxK
U2 - 10.1145/3162957.3162996
DO - 10.1145/3162957.3162996
M3 - Conference contribution
AN - SCOPUS:85042076307
T3 - ACM International Conference Proceeding Series
SP - 226
EP - 230
BT - Proceedings of the 3rd International Conference on Communication and Information Processing, ICCIP 2017
PB - Association for Computing Machinery
T2 - 3rd International Conference on Communication and Information Processing, ICCIP 2017
Y2 - 24 November 2017 through 26 November 2017
ER -