eduzhai > Applied Sciences > Engineering >

Investigation of Speaker-adaptation methods in Transformer based ASR

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: End-to-end models are fast replacing conventional hybrid models in automaticspeech recognition. A transformer is a sequence-to-sequence framework solelybased on attention, that was initially applied to machine translation task.This end-to-end framework has been shown to give promising results when usedfor automatic speech recognition as well. In this paper, we explore differentways of incorporating speaker information while training a transformer-basedmodel to improve its performance. We present speaker information in the form ofspeaker embeddings for each of the speakers. Two broad categories of speakerembeddings are used: (i)fixed embeddings, and (ii)learned embeddings. Weexperiment using speaker embeddings learned along with the model training, aswell as one-hot vectors and x-vectors. Using these different speakerembeddings, we obtain an average relative improvement of 1 to 3 in the tokenerror rate. We report results on the NPTEL lecture database. NPTEL is anopen-source e-learning portal providing content from top Indian universities.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...