eduzhai > Applied Sciences > Engineering >

A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 4 pages

Abstract: This paper presents a transfer learning method in speech emotion recognitionbased on a Time-Delay Neural Network (TDNN) architecture. A major challenge inthe current speech-based emotion detection research is data scarcity. Theproposed method resolves this problem by applying transfer learning techniquesin order to leverage data from the automatic speech recognition (ASR) task forwhich ample data is available. Our experiments also show the advantage ofspeaker-class adaptation modeling techniques by adopting identity-vector(i-vector) based features in addition to standard Mel-Frequency CepstralCoefficient (MFCC) features.[1] We show the transfer learning modelssignificantly outperform the other methods without pretraining on ASR. Theexperiments performed on the publicly available IEMOCAP dataset which provides12 hours of motional speech data. The transfer learning was initialized byusing the Ted-Lium v.2 speech dataset providing 207 hours of audio with thecorresponding transcripts. We achieve the highest significantly higher accuracywhen compared to state-of-the-art, using five-fold cross validation. Using onlyspeech, we obtain an accuracy 71.7 for anger, excitement, sadness, andneutrality emotion content.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...