TY - JOUR
T1 - End-to-end indonesian speech recognition with convolutional and gated recurrent units.
AU - Adiwidjaja, Rifqi
AU - Ivan Fanany, M.
N1 - Publisher Copyright:
© Published under licence by IOP Publishing Ltd.
PY - 2020/7/3
Y1 - 2020/7/3
N2 - Automatic Speech Recognition has penetrated deeply into our life. For well-resourced language, it can be considered as solved, but that's not the case for under-resourced language like Bahasa. Although it's the 7th most spoken language in the world, the research of speech recognition for Bahasa was still extremely limited, with setting still inconvenient for the real world and industry. This research is an attempt to make a speech recognition model that has applicability to the real world and industry, specifically that supports sentence level input with variable character length with end-to-end training. We built the model using the deep learning approach, specifically utilizing the residual networks and Bi-Directional Gated Recurrent Unit (Bi-GRU). To the best of our knowledge, this is the first Indonesian ASR model that can be trained in an end-to-end manner. Our model surpassed the baseline model on all metrics and achieve competitiveness with the current best result, which used the visual modal, for the dataset even with a more difficult and prone to noise modality like sound.
AB - Automatic Speech Recognition has penetrated deeply into our life. For well-resourced language, it can be considered as solved, but that's not the case for under-resourced language like Bahasa. Although it's the 7th most spoken language in the world, the research of speech recognition for Bahasa was still extremely limited, with setting still inconvenient for the real world and industry. This research is an attempt to make a speech recognition model that has applicability to the real world and industry, specifically that supports sentence level input with variable character length with end-to-end training. We built the model using the deep learning approach, specifically utilizing the residual networks and Bi-Directional Gated Recurrent Unit (Bi-GRU). To the best of our knowledge, this is the first Indonesian ASR model that can be trained in an end-to-end manner. Our model surpassed the baseline model on all metrics and achieve competitiveness with the current best result, which used the visual modal, for the dataset even with a more difficult and prone to noise modality like sound.
UR - http://www.scopus.com/inward/record.url?scp=85087771193&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/1566/1/012118
DO - 10.1088/1742-6596/1566/1/012118
M3 - Conference article
AN - SCOPUS:85087771193
SN - 1742-6588
VL - 1566
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012118
T2 - 4th International Conference on Computing and Applied Informatics 2019, ICCAI 2019
Y2 - 26 November 2019 through 27 November 2019
ER -