TY - JOUR
T1 - Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia
AU - Abka, Achmad F.
AU - Azizah, Kurniawati
AU - Jatmiko, Wisnu
N1 - Funding Information:
ACKNOWLEDGMENT This work was funded by PUTI Q3 Universitas Indonesia 2020-2021 with number: NKB-4375/UN2.RST/HKP.05.00/2020 and supported as part of Visiting Professorship Project on Gaining International Accreditation and Enhancing Academic Reputation Program Universitas Indonesia 2020-2021.
Publisher Copyright:
© 2022,International Journal of Advanced Computer Science and Applications.All Rights Reserved.
PY - 2022
Y1 - 2022
N2 - Cross-lingual summarization (CLS) is a process of generating a summary in the target language from a source document in another language. CLS is a challenging task because it involves two different languages. Traditionally, CLS is carried out in a pipeline scheme that involves two steps: summarization and translation. This approach has a problem, it introduces error propagation. To address this problem, we present a novel end-to-end abstractive CLS without the explicit use of machine translation. The CLS architecture is based on Transformer which is proven to be able to perform text generation well. The CLS model is a jointly trained CLS task and monolingual summarization (MS) task. This is accomplished by adding a second decoder to handle the MS task, while the first decoder handles the CLS task. We also incorporated multilingual word embeddings (MWE) components into the architecture to further improve the performance of the CLS models. Both English and Bahasa Indonesia are represented by MWE whose embeddings have already been mapped into the same vector space. MWE helps to better map the relation between input and output that use different languages. Experiments show that the proposed model achieves improvement up to +0.2981 ROUGE-1, +0.2084 ROUGE-2, and +0.2771 ROUGE-L when compared to the pipeline baselines and up to +0.1288 ROUGE-1, +0.1185 ROUGE-2, and +0.1413 ROUGE-L when compared to the end-to-end baselines.
AB - Cross-lingual summarization (CLS) is a process of generating a summary in the target language from a source document in another language. CLS is a challenging task because it involves two different languages. Traditionally, CLS is carried out in a pipeline scheme that involves two steps: summarization and translation. This approach has a problem, it introduces error propagation. To address this problem, we present a novel end-to-end abstractive CLS without the explicit use of machine translation. The CLS architecture is based on Transformer which is proven to be able to perform text generation well. The CLS model is a jointly trained CLS task and monolingual summarization (MS) task. This is accomplished by adding a second decoder to handle the MS task, while the first decoder handles the CLS task. We also incorporated multilingual word embeddings (MWE) components into the architecture to further improve the performance of the CLS models. Both English and Bahasa Indonesia are represented by MWE whose embeddings have already been mapped into the same vector space. MWE helps to better map the relation between input and output that use different languages. Experiments show that the proposed model achieves improvement up to +0.2981 ROUGE-1, +0.2084 ROUGE-2, and +0.2771 ROUGE-L when compared to the pipeline baselines and up to +0.1288 ROUGE-1, +0.1185 ROUGE-2, and +0.1413 ROUGE-L when compared to the end-to-end baselines.
KW - Automatic summarization
KW - Cross-lingual summarization
KW - Multilingual word embeddings
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85146676015&partnerID=8YFLogxK
U2 - 10.14569/IJACSA.2022.0131276
DO - 10.14569/IJACSA.2022.0131276
M3 - Article
AN - SCOPUS:85146676015
SN - 2158-107X
VL - 13
SP - 636
EP - 645
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 12
ER -