TY - GEN
T1 - A Comparative Study of Latent Semantics-based Anchor Word Selection Method for Separable Nonnegative Matrix Factorization
AU - Imami, Naufal Khairil
AU - Murfi, Hendri
AU - Wibowo, Arie
N1 - Funding Information:
This work was supported by Universitas Indonesia under PIT 9 2019. Any opinions, findings, and conclusions or recommendations are the authors' and do not necessarily reflect those of the sponsor.
Publisher Copyright:
© 2020 ACM.
PY - 2020/1/3
Y1 - 2020/1/3
N2 - Topic detection is a process used to analyze words in a collection of textual data to determine the topics in the collection, how they relate to each other, and how these topics change from time to time. One of recent topic detection methods is Separable Nonnegative Matrix Factorization (SNMF) which uses the direct method to solve nonnegative matrix factorization using separable assumption. There are three stages in the SNMF method, which are, generating a word co-occurrence matrix, determining anchor words, and recover to get the matrix of word-topics. In this paper, we examine a latent semantics-based method to determine the anchor words for each topics. Our simulation shows that both latent semantic-based methods reach coherence scores comparable to the standard method; however, more efficient in running time.
AB - Topic detection is a process used to analyze words in a collection of textual data to determine the topics in the collection, how they relate to each other, and how these topics change from time to time. One of recent topic detection methods is Separable Nonnegative Matrix Factorization (SNMF) which uses the direct method to solve nonnegative matrix factorization using separable assumption. There are three stages in the SNMF method, which are, generating a word co-occurrence matrix, determining anchor words, and recover to get the matrix of word-topics. In this paper, we examine a latent semantics-based method to determine the anchor words for each topics. Our simulation shows that both latent semantic-based methods reach coherence scores comparable to the standard method; however, more efficient in running time.
KW - latent semantics
KW - online news
KW - separable nonnegative matrix vectorization
KW - Topic detection
KW - twitter
UR - http://www.scopus.com/inward/record.url?scp=85083339824&partnerID=8YFLogxK
U2 - 10.1145/3378904.3378906
DO - 10.1145/3378904.3378906
M3 - Conference contribution
AN - SCOPUS:85083339824
T3 - ACM International Conference Proceeding Series
SP - 89
EP - 92
BT - BDET 2020 - 2020 2nd International Conference on Big Data Engineering and Technology
PB - Association for Computing Machinery
T2 - 2nd International Conference on Big Data Engineering and Technology, BDET 2020
Y2 - 3 January 2020 through 5 January 2020
ER -