TY - GEN
T1 - Identifying Indonesian local languages on spontaneous speech data
AU - Saputri, Mei Silviana
AU - Adriani, Mirna
PY - 2019/10
Y1 - 2019/10
N2 - Local languages are the most widely used as communication media in the daily conversations of Indonesian people. Preserving those local languages is crucial, especially for maintaining language and cultural identities. However, the variety of local languages raises communication problems. One of initial solution is developing a spoken language identification system to recognize different languages. This study developed a system of spoken language identification from speech data for Indonesian local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. The dataset used in this study is spontaneous speech data collected from local radio broadcasts for each language. This spontaneous dataset contains a lot of noises. Therefore, the suitable feature extraction and classification methods are required for developing a robust language identification system. In this study, three features are combined to identify languages, namely acoustic features based on i-vector, phonotactic features based on parallel phonemes and the dynamic prosody feature. Those features are merged on the hidden layer of Deep Neural Network (DNN). The experimental results showed that the f1-score achieved by combining those features with DNN on speech data with 3 seconds, 10 seconds and 30 seconds duration are 87.85%, 93.46%, and 96.73% respectively.
AB - Local languages are the most widely used as communication media in the daily conversations of Indonesian people. Preserving those local languages is crucial, especially for maintaining language and cultural identities. However, the variety of local languages raises communication problems. One of initial solution is developing a spoken language identification system to recognize different languages. This study developed a system of spoken language identification from speech data for Indonesian local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. The dataset used in this study is spontaneous speech data collected from local radio broadcasts for each language. This spontaneous dataset contains a lot of noises. Therefore, the suitable feature extraction and classification methods are required for developing a robust language identification system. In this study, three features are combined to identify languages, namely acoustic features based on i-vector, phonotactic features based on parallel phonemes and the dynamic prosody feature. Those features are merged on the hidden layer of Deep Neural Network (DNN). The experimental results showed that the f1-score achieved by combining those features with DNN on speech data with 3 seconds, 10 seconds and 30 seconds duration are 87.85%, 93.46%, and 96.73% respectively.
KW - Deep neural network
KW - Feature engineering
KW - Local language identification
KW - Speech data
UR - http://www.scopus.com/inward/record.url?scp=85081090516&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS47736.2019.8979939
DO - 10.1109/ICACSIS47736.2019.8979939
M3 - Conference contribution
T3 - 2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
SP - 247
EP - 254
BT - 2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
Y2 - 12 October 2019 through 13 October 2019
ER -