Audio Feature Extraction on SIBI Dataset for Speech Recognition

Ruhush Shoalihin, Erdefi Rakun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Mel Frequency Cepstral Coefficients has been regarded as the standard method of feature extraction for Automatic Speech Recognition (ASR) systems for the last few years. Its performance may be affected by multiple variables, such as the number of features, audio channels, filter width, or the types of filter banks used. In this paper, several comparisons were made to find the best combination of variables that provides the best results on the SIBI (Indonesian Sign Language) dataset, which consists of utterances of sentences by both Deaf and Hard of Hearing (DHH) and non-DHH people. Based on this experiment, although generally the ASR on DHH dataset is lower than those of the non-DHH dataset, the results are still relatively high, around 4.71 % WER and 10.30% SER compared to 0.15% and 0.40% in WER and SER, respectively.

Original languageEnglish
Title of host publicationProceedings - 2nd International Conference on Informatics, Multimedia, Cyber, and Information System, ICIMCIS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages70-74
Number of pages5
ISBN (Electronic)9781728191676
DOIs
Publication statusPublished - 19 Nov 2020
Event2nd International Conference on Informatics, Multimedia, Cyber, and Information System, ICIMCIS 2020 - Virtual, Jakarta, Indonesia
Duration: 19 Nov 202020 Nov 2020

Publication series

NameProceedings - 2nd International Conference on Informatics, Multimedia, Cyber, and Information System, ICIMCIS 2020

Conference

Conference2nd International Conference on Informatics, Multimedia, Cyber, and Information System, ICIMCIS 2020
Country/TerritoryIndonesia
CityVirtual, Jakarta
Period19/11/2020/11/20

Keywords

  • ASR
  • Automatic Speech Recognition
  • DHH
  • Mel Frequency Cepstral Coefficients
  • MFCC
  • SIBI

Fingerprint

Dive into the research topics of 'Audio Feature Extraction on SIBI Dataset for Speech Recognition'. Together they form a unique fingerprint.

Cite this