Stochastic Gradient Variational Bayes for deep learning-based ASR

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)


Many successful methods for training deep neural networks (DNN) rely on an unsupervised pretraining algorithm. It is particularly effective when the number of labeled training samples is not large enough, because pretraining method helps to initialize the parameter values in the appropriate range near a local good minimum, for further discriminative finetuning. However, while the improvement is impressive, training DNN is difficult because the objective function of DNN is highly non-convex function of the parameters. To avoid placing the parameter that generalizes poorly, a robust generative modelling is necessary. This paper explore an alternative of generative modelling for pretraining DNN-based acoustic modelling using Stochastic Gradient Variational Bayes (SGVB) within autoencoder framework called Variational Bayes Autoencoder (VBAE). It performs an efficient approximate inference and learning with directed probabilistic graphical models. During fine-tuning, probabilistic encoder parameters with latent variable components are then used in discriminative training for acoustic model. Here, we investigate the performances of DNN-based acoustic model using the proposed pretrained VBAE in comparison with widely used pretraining algorithms like Restricted Boltzmann Machine (RBM) and Stacked Denoising Autoencoder (SDAE). The results reveal that VBAE pretraining with Gaussian latent variables gave the best performance.

Original languageEnglish
Title of host publication2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781479972913
Publication statusPublished - 10 Feb 2016
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
Duration: 13 Dec 201517 Dec 2015

Publication series

Name2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings


ConferenceIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
Country/TerritoryUnited States


  • acoustic model
  • autoencoder
  • deep neural network
  • variational Bayes


Dive into the research topics of 'Stochastic Gradient Variational Bayes for deep learning-based ASR'. Together they form a unique fingerprint.

Cite this