Enhanced TextRank using weighted word embedding for text summarization

Evi Yulianti, Nicholas Pangestu, Meganingrum Arista Jiwanggi

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The length of a news article may influence people’s interest to read the article. In this case, text summarization can help to create a shorter representative version of an article to reduce people’s read time. This paper proposes to use weighted word embedding based on Word2Vec, FastText, and bidirectional encoder representations from transformers (BERT) models to enhance the TextRank summarization algorithm. The use of weighted word embedding is aimed to create better sentence representation, in order to produce more accurate summaries. The results show that using (unweighted) word embedding significantly improves the performance of the TextRank algorithm, with the best performance gained by the summarization system using BERT word embedding. When each word embedding is weighed using term frequency-inverse document frequency (TF-IDF), the performance for all systems using unweighted word embedding further significantly improve, with the biggest improvement achieved by the systems using Word2Vec (with 6.80% to 12.92% increase) and FastText (with 7.04% to 12.78% increase). Overall, our systems using weighted word embedding can outperform the TextRank method by up to 17.33% in ROUGE-1 and 30.01% in ROUGE-2. This demonstrates the effectiveness of weighted word embedding in the TextRank algorithm for text summarization.

Original languageEnglish
Pages (from-to)5472-5482
Number of pages11
JournalInternational Journal of Electrical and Computer Engineering
Volume13
Issue number5
DOIs
Publication statusPublished - Oct 2023

Keywords

  • Bidirectional encoder
  • document frequency
  • FastText
  • representations from
  • Term frequency-inverse
  • Text summarization
  • TextRank
  • transformers
  • Weighted word embedding
  • Word2Vec

Fingerprint

Dive into the research topics of 'Enhanced TextRank using weighted word embedding for text summarization'. Together they form a unique fingerprint.

Cite this