A Comprehensive Exploration of Fine-Tuning WavLM for Enhancing Speech Emotion Recognition

Fadel Ali, Aniati Murni Arymurthy, Radityo Eko Prasojo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech Emotion Recognition (SER) is a pivotal area in Human-Computer Interaction (HCI) with numerous applications. Traditional SER models rely on supervised learning but face challenges due to limited labeled data and the subjective nature of emotions. Self-supervised learning (SSL) offers an alternative by leveraging unlabeled audio data. WavLM, a large-scale SSL audio model, has shown promise in various speech processing tasks. This paper investigates the fine-Tuning of WavLM for SER, analyzing the impact of different segment of layers on performance. We conducted experiments on the IEMOCAP and RAVDESS datasets, comparing various fine-Tuned WavLM models with Wav2Vec 2.0 and HuBERT as SSL baselines. Results reveal that WavLM outperforms SSL base-lines, with all-layer fine-Tuned WavLM achieving state-of-The-Art results. Interestingly, fine-Tuning top layers significantly enhances performance, suggesting their role in encoding paralinguistic information. However, fine-Tuning all layers remains superior. These findings shed light on optimizing SSL audio models for SER and highlight WavLM's potential in emotion recognition.

Original languageEnglish
Title of host publication6th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2023 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages295-300
Number of pages6
ISBN (Electronic)9798350358346
DOIs
Publication statusPublished - 2023
Event6th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2023 - Batam, Indonesia
Duration: 11 Dec 2023 → …

Publication series

Name6th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2023 - Proceeding

Conference

Conference6th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2023
Country/TerritoryIndonesia
CityBatam
Period11/12/23 → …

Keywords

  • audio model
  • fine-Tuning
  • self-supervised learning
  • Speech emotion recognition
  • WavLM

Fingerprint

Dive into the research topics of 'A Comprehensive Exploration of Fine-Tuning WavLM for Enhancing Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this