eduzhai > Applied Sciences > Engineering >

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

  • king
  • (0) Download
  • 20210507
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinctlanguage model (LM) component that characterizes traditional speech systems.While this simplifies the model architecture, it complicates the task ofincorporating text-only data into training, which is important to therecognition of tail words that do not occur often in audio-text pairs. Whileshallow fusion has been proposed as a method for incorporating a pre-trained LMinto an E2E model at inference time, it has not yet been explored for verylarge text corpora, and it has been shown to be very sensitive tohyperparameter settings in the beam search. In this work, we apply shallowfusion to incorporate a very large text corpus into a state-of-the-art E2EASRmodel. We explore the impact of model size and show that intelligent pruning ofthe training set can be more effective than increasing the parameter count.Additionally, we show that incorporating the LM in minimum word error rate(MWER) fine tuning makes shallow fusion far less dependent on optimalhyperparameter settings, reducing the difficulty of that tuning problem.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...