Hierarchical transfer learning for text-to-speech in Indonesian, Javanese, and Sundanese languages

Kurniawati Azizah, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

This research develops end-to-end deep learning-based text-to-speech (TTS) in Indonesian, Javanese, and Sundanese. While end-to-end neural TTS, such as Tacotron-2, has made remarkable progress recently, it still suffers from a data scarcity problem for low-resource languages such as Javanese and Sundanese. Our preliminary study shows that Tacotron-2-based TTS needs a large amount of training data; a minimum of 10 hours of training data is required for the model to be able to synthesize acceptable quality and intelligible speech. To solve this low-resource problem, our work proposes a hierarchical transfer learning to train TTS for Javanese and Sundanese, by taking advantage of a dissimilar high-resource language of English domain and a similar intermediate-resource language of Indonesian domain. We report that the evaluation of synthesized speech using the mean opinion score (MOS) reaches 4.27 for Indonesian, and 4.08 for Javanese, and 3.92 for Sundanese. The word accuracy (WAcc) evaluation on semantically unpredicted sentences (SUS) reaches 98.26% for Indonesian, 95.02% for Javanese, and 95.43% for Sundanese. The subjective evaluations of the synthetic speech quality demonstrate that our transfer learning scheme is successfully applied to TTS model for low-resource target domain. Using less than one hour of training data, 38 minutes for Indonesian, 16 minutes for Javanese, and 19 minutes for Sundanese, TTS models can learn fast and achieve adequate performance.

Original languageEnglish
Title of host publication2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages421-428
Number of pages8
ISBN (Electronic)9781728192796
DOIs
Publication statusPublished - 17 Oct 2020
Event12th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020 - Virtual, Depok, Indonesia
Duration: 17 Oct 202018 Oct 2020

Publication series

Name2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020

Conference

Conference12th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
Country/TerritoryIndonesia
CityVirtual, Depok
Period17/10/2018/10/20

Keywords

  • Deep learning
  • Hierarchical transfer learning
  • Indonesian
  • Javanese
  • Low-resource problem
  • Sundanese
  • Text-to-speech

Fingerprint

Dive into the research topics of 'Hierarchical transfer learning for text-to-speech in Indonesian, Javanese, and Sundanese languages'. Together they form a unique fingerprint.

Cite this