Identifying Indonesian local languages on spontaneous speech data

Mei Silviana Saputri, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Local languages are the most widely used as communication media in the daily conversations of Indonesian people. Preserving those local languages is crucial, especially for maintaining language and cultural identities. However, the variety of local languages raises communication problems. One of initial solution is developing a spoken language identification system to recognize different languages. This study developed a system of spoken language identification from speech data for Indonesian local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. The dataset used in this study is spontaneous speech data collected from local radio broadcasts for each language. This spontaneous dataset contains a lot of noises. Therefore, the suitable feature extraction and classification methods are required for developing a robust language identification system. In this study, three features are combined to identify languages, namely acoustic features based on i-vector, phonotactic features based on parallel phonemes and the dynamic prosody feature. Those features are merged on the hidden layer of Deep Neural Network (DNN). The experimental results showed that the f1-score achieved by combining those features with DNN on speech data with 3 seconds, 10 seconds and 30 seconds duration are 87.85%, 93.46%, and 96.73% respectively.

Original languageEnglish
Title of host publication2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages247-254
Number of pages8
ISBN (Electronic)9781728152929
DOIs
Publication statusPublished - Oct 2019
Event11th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019 - Bali, Indonesia
Duration: 12 Oct 201913 Oct 2019

Publication series

Name2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019

Conference

Conference11th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
CountryIndonesia
CityBali
Period12/10/1913/10/19

Keywords

  • Deep neural network
  • Feature engineering
  • Local language identification
  • Speech data

Fingerprint Dive into the research topics of 'Identifying Indonesian local languages on spontaneous speech data'. Together they form a unique fingerprint.

Cite this