eduzhai > Applied Sciences > Engineering >

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: In this work, we propose a novel and efficient minimum word error rate (MWER)training method for RNN-Transducer (RNN-T). Unlike previous work on this topic,which performs on-the-fly limited-size beam-search decoding and generatesalignment scores for expected edit-distance computation, in our proposedmethod, we re-calculate and sum scores of all the possible alignments for eachhypothesis in N-best lists. The hypothesis probability scores andback-propagated gradients are calculated efficiently using the forward-backwardalgorithm. Moreover, the proposed method allows us to decouple the decoding andtraining processes, and thus we can perform offline parallel-decoding and MWERtraining for each subset iteratively. Experimental results show that thisproposed semi-on-the-fly method can speed up the on-the-fly method by 6 timesand result in a similar WER improvement (3.6 ) over a baseline RNN-T model. Theproposed MWER training can also effectively reduce high-deletion errors (9.2 WER-reduction) introduced by RNN-T models when EOS is added for endpointer.Further improvement can be achieved if we use a proposed RNN-T rescoring methodto re-rank hypotheses and use external RNN-LM to perform additional rescoring.The best system achieves a 5 relative improvement on an English test-set ofreal far-field recordings and a 11.6 WER reduction on music-domain utterances.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...