Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR

Andros Tjandra, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Citations (Scopus)

Abstract

This paper explores the use of auditory features based on cochleograms; two dimensional speech features derived from gammatone filters within the convolutional neural network (CNN) framework. Furthermore, we also propose various possibilities to combine cochleogram features with log-mel filter banks or spectrogram features. In particular, we combine within low and high levels of CNN framework which we refer to as low-level and high-level feature combination. As comparison, we also construct the similar configuration with deep neural network (DNN). Performance was evaluated in the framework of hybrid neural network - hidden Markov model (NN-HMM) system on TIMIT phoneme sequence recognition task. The results reveal that cochleogram-spectrogram feature combination provides significant advantages. The best accuracy was obtained by high-level combination of two dimensional cochleogram-spectrogram features using CNN, achieved up to 8.2% relative phoneme error rate (PER) reduction from CNN single features or 19.7% relative PER reduction from DNN single features.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4525-4529
Number of pages5
ISBN (Electronic)9781467369978
DOIs
Publication statusPublished - 1 Jan 2015
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: 19 Apr 201424 Apr 2014

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2015-August
ISSN (Print)1520-6149

Conference

Conference40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
CountryAustralia
CityBrisbane
Period19/04/1424/04/14

Keywords

  • cochleogram
  • Deep learning
  • DNN and CNN
  • feature combination

Fingerprint Dive into the research topics of 'Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR'. Together they form a unique fingerprint.

Cite this