Sentence-level Indonesian lip reading with spatiotemporal CNN and gated RNN

Muhammad Rizki Aulia Rahman Maulana, Mohamad Ivan Fanany

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

It is widely known that visual cues play an important role in speech, especially in disambiguating confusable phonemes or as a means for 'hearing' visually. Interpreting speech only through visual signal is called lip reading. Lip reading has several potential application as a complementary modality to speech recognition or as purely visual speech recognition, which gives rises to silent speech interface, which by itself has numerous practical application. Although the overwhelming potential of such system, research on lip reading for the Indonesian language was extremely limited, with settings still very distant from the real world. This research is an attempt to make a lip reading model that has the potential to be applicable in the real world, specifically by building a lip reading model that supports a variable-length sentence as its input We build the model using deep learning, specifically spatiotemporal Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) that both respectively form spatiotemporal feature extractor and character-level sentence decoder. During the process, we also investigate whether knowledge on lip reading on other language affects the acquisition of a different language. To the best of our knowledge, our model was the first sentence level Indonesian language lip reading that supports variable-length input. Our model achieved superhuman performance on all metrics, with almost 20× better word accuracy.

Original languageEnglish
Title of host publication2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages375-380
Number of pages6
ISBN (Electronic)9781538631720
DOIs
Publication statusPublished - 2 Jul 2017
Event9th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017 - Jakarta, Indonesia
Duration: 28 Oct 201729 Oct 2017

Publication series

Name2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
Volume2018-January

Conference

Conference9th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
Country/TerritoryIndonesia
CityJakarta
Period28/10/1729/10/17

Fingerprint

Dive into the research topics of 'Sentence-level Indonesian lip reading with spatiotemporal CNN and gated RNN'. Together they form a unique fingerprint.

Cite this