Deep Neural Network for Speaker Identification Using Static and Dynamic Prosodic Feature for Spontaneous and Dictated Data

Arifan Rahman, Wahyu Catur Wibowo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We can recognize a person by his voice alone. In principle, the sound has a tone (pitch) that is different for each person. This study aims to measure a Deep Neural Network (DNN) performance with static and dynamic prosodic features. Prosodic is information about sound related to tone, intonation, pressure, duration, and rhythm of a person's pronunciation. The data is dictated and spontaneous voice data that taken from YouTube. The data consists of three male voices and one female voice. The data is segmented into various duration, 3 seconds,5 seconds, and 10 seconds. After the data has been segmented, extracted static prosodic features with 103 dimensions and dynamic prosodic features with 13 dimensions. Each feature and feature combination are trained and tested using DNN with a ratio of 90:10. The result shows that the 10 seconds segmented data has higher accuracy than the others. Accuracy of static prosodic features is better than dynamic prosodic features. The average accuracy of DNN for static prosodic features is 87.02%. The average accuracy of DNN for dynamic prosodic features is 72.97%. The average accuracy of DNN for combined static and dynamic prosodic features is 87.72 %.

Original languageEnglish
Title of host publication2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665442640
DOIs
Publication statusPublished - 2021
Event13th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021 - Depok, Indonesia
Duration: 23 Oct 202126 Oct 2021

Publication series

Name2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021

Conference

Conference13th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
Country/TerritoryIndonesia
CityDepok
Period23/10/2126/10/21

Keywords

  • deep neural network
  • prosodic
  • signal processing
  • voice

Fingerprint

Dive into the research topics of 'Deep Neural Network for Speaker Identification Using Static and Dynamic Prosodic Feature for Spontaneous and Dictated Data'. Together they form a unique fingerprint.

Cite this