Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia

Research output: Contribution to journalArticlepeer-review

Abstract

Cross-lingual summarization (CLS) is a process of generating a summary in the target language from a source document in another language. CLS is a challenging task because it involves two different languages. Traditionally, CLS is carried out in a pipeline scheme that involves two steps: summarization and translation. This approach has a problem, it introduces error propagation. To address this problem, we present a novel end-to-end abstractive CLS without the explicit use of machine translation. The CLS architecture is based on Transformer which is proven to be able to perform text generation well. The CLS model is a jointly trained CLS task and monolingual summarization (MS) task. This is accomplished by adding a second decoder to handle the MS task, while the first decoder handles the CLS task. We also incorporated multilingual word embeddings (MWE) components into the architecture to further improve the performance of the CLS models. Both English and Bahasa Indonesia are represented by MWE whose embeddings have already been mapped into the same vector space. MWE helps to better map the relation between input and output that use different languages. Experiments show that the proposed model achieves improvement up to +0.2981 ROUGE-1, +0.2084 ROUGE-2, and +0.2771 ROUGE-L when compared to the pipeline baselines and up to +0.1288 ROUGE-1, +0.1185 ROUGE-2, and +0.1413 ROUGE-L when compared to the end-to-end baselines.

Original languageEnglish
Pages (from-to)636-645
Number of pages10
JournalInternational Journal of Advanced Computer Science and Applications
Volume13
Issue number12
DOIs
Publication statusPublished - 2022

Keywords

  • Automatic summarization
  • Cross-lingual summarization
  • Multilingual word embeddings
  • Transformer

Fingerprint

Dive into the research topics of 'Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia'. Together they form a unique fingerprint.

Cite this