eduzhai > Applied Sciences > Engineering >

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: The Transformer has shown impressive performance in automatic speechrecognition. It uses the encoder-decoder structure with self-attention to learnthe relationship between the high-level representation of the source inputs andembedding of the target outputs. In this paper, we propose a novel decoderstructure that features a self-and-mixed attention decoder (SMAD) with a deepacoustic structure (DAS) to improve the acoustic representation ofTransformer-based LVCSR. Specifically, we introduce a self-attention mechanismto learn a multi-layer deep acoustic structure for multiple levels of acousticabstraction. We also design a mixed attention mechanism that learns thealignment between different levels of acoustic abstraction and itscorresponding linguistic information simultaneously in a shared embeddingspace. The ASR experiments on Aishell-1 shown that the proposed structureachieves CERs of 4.8 on the dev set and 5.1 on the test set, which are thebest results obtained on this task to the best of our knowledge.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...