TY - GEN
T1 - Hierarchical transfer learning for text-to-speech in Indonesian, Javanese, and Sundanese languages
AU - Azizah, Kurniawati
AU - Adriani, Mirna
N1 - Funding Information:
This research is funded by PUTI Saintekes grant No. NKB- 2146/UN2.RST/HKP.05.00/2020 from Universitas Indonesia. This research is supported by the computing facilities at the Tokopedia-UI AI Center of Excellence.
Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2020/10/17
Y1 - 2020/10/17
N2 - This research develops end-to-end deep learning-based text-to-speech (TTS) in Indonesian, Javanese, and Sundanese. While end-to-end neural TTS, such as Tacotron-2, has made remarkable progress recently, it still suffers from a data scarcity problem for low-resource languages such as Javanese and Sundanese. Our preliminary study shows that Tacotron-2-based TTS needs a large amount of training data; a minimum of 10 hours of training data is required for the model to be able to synthesize acceptable quality and intelligible speech. To solve this low-resource problem, our work proposes a hierarchical transfer learning to train TTS for Javanese and Sundanese, by taking advantage of a dissimilar high-resource language of English domain and a similar intermediate-resource language of Indonesian domain. We report that the evaluation of synthesized speech using the mean opinion score (MOS) reaches 4.27 for Indonesian, and 4.08 for Javanese, and 3.92 for Sundanese. The word accuracy (WAcc) evaluation on semantically unpredicted sentences (SUS) reaches 98.26% for Indonesian, 95.02% for Javanese, and 95.43% for Sundanese. The subjective evaluations of the synthetic speech quality demonstrate that our transfer learning scheme is successfully applied to TTS model for low-resource target domain. Using less than one hour of training data, 38 minutes for Indonesian, 16 minutes for Javanese, and 19 minutes for Sundanese, TTS models can learn fast and achieve adequate performance.
AB - This research develops end-to-end deep learning-based text-to-speech (TTS) in Indonesian, Javanese, and Sundanese. While end-to-end neural TTS, such as Tacotron-2, has made remarkable progress recently, it still suffers from a data scarcity problem for low-resource languages such as Javanese and Sundanese. Our preliminary study shows that Tacotron-2-based TTS needs a large amount of training data; a minimum of 10 hours of training data is required for the model to be able to synthesize acceptable quality and intelligible speech. To solve this low-resource problem, our work proposes a hierarchical transfer learning to train TTS for Javanese and Sundanese, by taking advantage of a dissimilar high-resource language of English domain and a similar intermediate-resource language of Indonesian domain. We report that the evaluation of synthesized speech using the mean opinion score (MOS) reaches 4.27 for Indonesian, and 4.08 for Javanese, and 3.92 for Sundanese. The word accuracy (WAcc) evaluation on semantically unpredicted sentences (SUS) reaches 98.26% for Indonesian, 95.02% for Javanese, and 95.43% for Sundanese. The subjective evaluations of the synthetic speech quality demonstrate that our transfer learning scheme is successfully applied to TTS model for low-resource target domain. Using less than one hour of training data, 38 minutes for Indonesian, 16 minutes for Javanese, and 19 minutes for Sundanese, TTS models can learn fast and achieve adequate performance.
KW - Deep learning
KW - Hierarchical transfer learning
KW - Indonesian
KW - Javanese
KW - Low-resource problem
KW - Sundanese
KW - Text-to-speech
UR - http://www.scopus.com/inward/record.url?scp=85099739218&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS51025.2020.9263086
DO - 10.1109/ICACSIS51025.2020.9263086
M3 - Conference contribution
AN - SCOPUS:85099739218
T3 - 2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
SP - 421
EP - 428
BT - 2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
Y2 - 17 October 2020 through 18 October 2020
ER -