BERT-Based Deep Embedded Clustering for Topic Modeling

Research output: Contribution to journalConference articlepeer-review

Abstract

Topic detection is a powerful method that emerges as a solution to uncover the latent structures in a document. A general framework of clustering-based topic detection consists of two steps: representation learning and topic detection with clustering. In this study, bidirectional encoder representations from transformers (BERT) is utilized for the representation learning because of its ability to learn text, allowing BERT to capture the context of each word’s context based on its surrounding. Text representations obtained from BERT are used for topic detection with clustering. Deep embedded clustering (DEC) and improved deep embedded clustering (IDEC) are the clustering models used in this study for topic detection with clustering. DEC and IDEC are deep learning-based clustering techniques that can simultaneously transform data into lower dimensional space and optimize the clusters. The combination of BERT as the text representation model with DEC and IDEC becomes a deep learning structure model for topic detection. After obtaining the word sets that represent the topics, evaluations are carried out by examining the sensitivity of hyperparameters and the topic coherence value. The simulations showed that DEC and IDEC are robust to hyperparameter changes. DEC and IDEC also outperformed uniform manifold approximation and projection (UMAP)based K-means and eigenspace-based fuzzy c-means (EFCM) by using topic coherence Word2Vec (TC-W2V).

Original languageEnglish
Pages (from-to)331-336
Number of pages6
JournalInternational Conference on Computer, Control, Informatics and its Applications, IC3INA
Issue number2024
DOIs
Publication statusPublished - 2024
Event11th International Conference on Computer, Control, Informatics and its Applications, IC3INA 2024 - Hybrid, Bandung, Indonesia
Duration: 9 Oct 202410 Oct 2024

Keywords

  • deep learning
  • representation learning
  • topic detection
  • unsupervised learning

Fingerprint

Dive into the research topics of 'BERT-Based Deep Embedded Clustering for Topic Modeling'. Together they form a unique fingerprint.

Cite this