eduzhai > Applied Sciences > Engineering >

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Recently, streaming end-to-end automatic speech recognition (E2E-ASR) hasgained more and more attention. Many efforts have been paid to turn thenon-streaming attention-based E2E-ASR system into streaming architecture. Inthis work, we propose a novel online E2E-ASR system by using StreamingChunk-Aware Multihead Attention(SCAMA) and a latency control memory equippedself-attention network (LC-SAN-M). LC-SAN-M uses chunk-level input to controlthe latency of encoder. As to SCAMA, a jointly trained predictor is used tocontrol the output of encoder when feeding to decoder, which enables decoder togenerate output in streaming manner. Experimental results on the open 170-hourAISHELL-1 and an industrial-level 20000-hour Mandarin speech recognition tasksshow that our approach can significantly outperform the MoChA-based baselinesystem under comparable setup. On the AISHELL-1 task, our proposed methodachieves a character error rate (CER) of 7.39 , to the best of our knowledge,which is the best published performance for online ASR.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...