eduzhai > Applied Sciences > Engineering >

Double Multi-Head Attention for Speaker Verification

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Most state-of-the-art Deep Learning systems for speaker verification arebased on speaker embedding extractors. These architectures are commonlycomposed of a feature extractor front-end together with a pooling layer toencode variable-length utterances into fixed-length speaker vectors. In thispaper we present Double Multi-Head Attention pooling, which extends ourprevious approach based on Self Multi-Head Attention. An additional selfattention layer is added to the pooling layer that summarizes the contextvectors produced by Multi-Head Attention into a unique speaker representation.This method enhances the pooling mechanism by giving weights to the informationcaptured for each head and it results in creating more discriminative speakerembeddings. We have evaluated our approach with the VoxCeleb2 dataset. Ourresults show 6.09 and 5.23 relative improvement in terms of EER compared toSelf Attention pooling and Self Multi-Head Attention, respectively. Accordingto the obtained results, Double Multi-Head Attention has shown to be anexcellent approach to efficiently select the most relevant features captured bythe CNN-based front-ends from the speech signal.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...