Abstract
Topic detection is a powerful method that emerges as a solution to uncover the latent structures in a document. A general framework of clustering-based topic detection consists of two steps: representation learning and topic detection with clustering. In this study, bidirectional encoder representations from transformers (BERT) is utilized for the representation learning because of its ability to learn text, allowing BERT to capture the context of each word’s context based on its surrounding. Text representations obtained from BERT are used for topic detection with clustering. Deep embedded clustering (DEC) and improved deep embedded clustering (IDEC) are the clustering models used in this study for topic detection with clustering. DEC and IDEC are deep learning-based clustering techniques that can simultaneously transform data into lower dimensional space and optimize the clusters. The combination of BERT as the text representation model with DEC and IDEC becomes a deep learning structure model for topic detection. After obtaining the word sets that represent the topics, evaluations are carried out by examining the sensitivity of hyperparameters and the topic coherence value. The simulations showed that DEC and IDEC are robust to hyperparameter changes. DEC and IDEC also outperformed uniform manifold approximation and projection (UMAP)based K-means and eigenspace-based fuzzy c-means (EFCM) by using topic coherence Word2Vec (TC-W2V).
Original language | English |
---|---|
Pages (from-to) | 331-336 |
Number of pages | 6 |
Journal | International Conference on Computer, Control, Informatics and its Applications, IC3INA |
Issue number | 2024 |
DOIs | |
Publication status | Published - 2024 |
Event | 11th International Conference on Computer, Control, Informatics and its Applications, IC3INA 2024 - Hybrid, Bandung, Indonesia Duration: 9 Oct 2024 → 10 Oct 2024 |
Keywords
- deep learning
- representation learning
- topic detection
- unsupervised learning