eduzhai > Applied Sciences > Engineering >

Knowing What to Listen to Early Attention for Deep Speech Representation Learning

  • king
  • (0) Download
  • 20210507
  • Save

... pages left unread,continue reading

Document pages: 8 pages

Abstract: Deep learning techniques have considerably improved speech processing inrecent years. Speech representations extracted by deep learning models arebeing used in a wide range of tasks such as speech recognition, speakerrecognition, and speech emotion recognition. Attention models play an importantrole in improving deep learning models. However current attention mechanismsare unable to attend to fine-grained information items. In this paper wepropose the novel Fine-grained Early Frequency Attention (FEFA) for speechsignals. This model is capable of focusing on information items as small asfrequency bins. We evaluate the proposed model on two popular tasks of speakerrecognition and speech emotion recognition. Two widely used public datasets,VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented ontop of several prominent deep models as backbone networks to evaluate itsimpact on performance compared to the original networks and other related work.Our experiments show that by adding FEFA to different CNN architectures,performance is consistently improved by substantial margins, even setting a newstate-of-the-art for the speaker recognition task. We also tested our modelagainst different levels of added noise showing improvements in robustness andless sensitivity compared to the backbone networks.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...