eduzhai > Applied Sciences > Engineering >

S-vectors Speaker Embeddings based on Transformers Encoder for Text-Independent Speaker Verification

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: X-vectors have become the standard for speaker-embeddings in automaticspeaker verification. X-vectors are obtained using a Time-delay Neural Network(TDNN) with context over several frames. We have explored the use of anarchitecture built on self-attention which attends to all the features over theentire utterance, and hence better capture speaker-level characteristics. Wehave used the encoder structure of Transformers, which is built onself-attention, as the base architecture and trained it to do a speakerclassification task. In this paper, we have proposed to derive speakerembeddings from the output of the trained Transformer encoder structure afterappropriate statistics pooling to obtain utterance level features. We havenamed the speaker embeddings from this structure as s-vectors. s-vectorsoutperform x-vectors with a relative improvement of 10 and 15 in EER whentrained on Voxceleb-1 only and Voxceleb-1+2 datasets. We have also investigatedthe effect of deriving s-vectors from different layers of the model.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...