TY - GEN
T1 - Indonesian audio-visual speech corpus for multimodal automatic speech recognition
AU - Maulana, Muhammad Rizki Aulia Rahman
AU - Fanany, Mohamad Ivan
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data, even more so for deep learning ASR system which is at the forefront of ASR research. A multitude of such corpus has been built to accommodate such need, ranging from single modal corpus which caters the need for mostly acoustic speech recognition, with several exceptions on visual speech decoding, to multimodal corpus which provides the need for both modalities. Multimodal corpus was significant in the development of ASR as speech is inherently multimodal in the very first place. Despite the importance, none of this corpus was built for Indonesian language, resulting in little to no development of visual-only or multimodal ASR systems. This research is an attempt to solve that problem by constructing AVID, an Indonesian audio-visual speech corpus for multimodal ASR. The corpus consists of 10 speakers speaking 1,040 sentences with a simple structure, resulting in 10,400 videos of spoken sentences. To the best of our knowledge, AVID is the first audio-visual speech corpus for the Indonesian language which is designed for multimodal ASR. AVID was heavily tested and contains overall low errors in both modality tests, which indicates the high quality and suitability of the corpus for building multimodal ASR systems.
AB - Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data, even more so for deep learning ASR system which is at the forefront of ASR research. A multitude of such corpus has been built to accommodate such need, ranging from single modal corpus which caters the need for mostly acoustic speech recognition, with several exceptions on visual speech decoding, to multimodal corpus which provides the need for both modalities. Multimodal corpus was significant in the development of ASR as speech is inherently multimodal in the very first place. Despite the importance, none of this corpus was built for Indonesian language, resulting in little to no development of visual-only or multimodal ASR systems. This research is an attempt to solve that problem by constructing AVID, an Indonesian audio-visual speech corpus for multimodal ASR. The corpus consists of 10 speakers speaking 1,040 sentences with a simple structure, resulting in 10,400 videos of spoken sentences. To the best of our knowledge, AVID is the first audio-visual speech corpus for the Indonesian language which is designed for multimodal ASR. AVID was heavily tested and contains overall low errors in both modality tests, which indicates the high quality and suitability of the corpus for building multimodal ASR systems.
UR - http://www.scopus.com/inward/record.url?scp=85051115512&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS.2017.8355062
DO - 10.1109/ICACSIS.2017.8355062
M3 - Conference contribution
AN - SCOPUS:85051115512
T3 - 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
SP - 381
EP - 385
BT - 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
Y2 - 28 October 2017 through 29 October 2017
ER -