eduzhai > Applied Sciences > Engineering >

SAN-M Memory Equipped Self-Attention for End-to-End Speech Recognition

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: End-to-end speech recognition has become popular in recent years, since itcan integrate the acoustic, pronunciation and language models into a singleneural network. Among end-to-end approaches, attention-based methods haveemerged as being superior. For example, Transformer, which adopts anencoder-decoder architecture. The key improvement introduced by Transformer isthe utilization of self-attention instead of recurrent mechanisms, enablingboth encoder and decoder to capture long-range dependencies with lowercomputational this http URL this work, we propose boosting the self-attentionability with a DFSMN memory block, forming the proposed memory equippedself-attention (SAN-M) mechanism. Theoretical and empirical comparisons havebeen made to demonstrate the relevancy and complementarity betweenself-attention and the DFSMN memory block. Furthermore, the proposed SAN-Mprovides an efficient mechanism to integrate these two modules. We haveevaluated our approach on the public AISHELL-1 benchmark and anindustrial-level 20,000-hour Mandarin speech recognition task. On both tasks,SAN-M systems achieved much better performance than the self-attention basedTransformer baseline system. Specially, it can achieve a CER of 6.46 on theAISHELL-1 task even without using any external LM, comfortably outperformingother state-of-the-art systems.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...