eduzhai > Applied Sciences > Engineering >

Applying Speech Tempo-Derived Features BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE)consists of three Sub-Challenges, where the tasks are to identify the level ofarousal and valence of elderly speakers, determine whether the actual speakerwearing a surgical mask, and estimate the actual breathing of the speaker. Inour contribution to the Challenge, we focus on the Elderly Emotion and the Masksub-challenges. Besides utilizing standard or close-to-standard features suchas ComParE functionals, Bag-of-Audio-Words and Fisher vectors, we exploit thatemotion is related to the velocity of speech (i.e. speech rate). To utilizethis, we perform phone-level recognition using an ASR system, and extractfeatures from the output such as articulation tempo, speech tempo, and variousattributes measuring the amount of pauses. We also hypothesize that wearing asurgical mask makes the speaker feel uneasy, leading to a slower speech rateand more hesitations; hence, we experiment with the same features in the Masksub-challenge as well. Although this theory was not justified by theexperimental results on the Mask Sub-Challenge, in the Elderly EmotionSub-Challenge we got significantly improved arousal and valence values withthis feature type both on the development set and in cross-validation.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...