Visual-only word boundary detection

Muhammad Rizki Aulia Rahman Maulana, Retno Larasati, Mohamad Ivan Fanany

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Word boundary detection is one of the primary components in speech recognition system, which can be learned jointly as part of the speech model or independently as an extra step of preprocessing, reducing the problem into a conditionally independent word prediction. It can also be used to separate Out of Vocabulary (OOV) words in the sentence, thereby avoiding unnecessary computation. By itself, word boundary detection is essential in multimodal corpus collection, in which it allows automated and detailed labeling towards the dataset, be it on sentence or word level. In this research, we proposed a novel approach in word boundary detection, that is, by utilizing only visual information, using 3-Dimensional Convolutional Neural Network (3D-CNN) and Bidirectional-Gated Recurrent Unit (Bi-GRU). This research is important in paving the way for a better lip reading system, as well as multimodal speech recognition, as it allows easier creation of novel dataset and enables conventional word-level visual or multimodal speech recognition system to work on continuous speech. Training was done on GRID video corpus on 118 epochs. The proposed model performed well compared to the baseline method, with considerably lower error rate.

Original languageEnglish
Title of host publicationMulti-disciplinary Trends in Artificial Intelligence - 11th International Workshop, MIWAI 2017, Proceedings
EditorsSomnuk Phon-Amnuaisuk, Swee-Peng Ang, Soo-Young Lee
PublisherSpringer Verlag
Number of pages12
ISBN (Print)9783319694559
Publication statusPublished - 2017
Event11th Multi-disciplinary International Workshop on Artificial Intelligence, MIWAI 2017 - Gadong, Brunei Darussalam
Duration: 20 Nov 201722 Nov 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10607 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference11th Multi-disciplinary International Workshop on Artificial Intelligence, MIWAI 2017
Country/TerritoryBrunei Darussalam


  • 3-Dimensional convolutional neural network
  • Speech recognition
  • Word boundary detection
  • Word segmentation


Dive into the research topics of 'Visual-only word boundary detection'. Together they form a unique fingerprint.

Cite this