eduzhai > Applied Sciences > Engineering >

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Deep neural network with dual-path bi-directional long short-term memory(BiLSTM) block has been proved to be very effective in sequence modeling,especially in speech separation. This work investigates how to extend dual-pathBiLSTM to result in a new state-of-the-art approach, called TasTas, formulti-talker monaural speech separation (a.k.a cocktail party problem). TasTasintroduces two simple but effective improvements, one is an iterativemulti-stage refinement scheme, and the other is to correct the speech withimperfect separation through a loss of speaker identity consistency between theseparated speech and original speech, to boost the performance of dual-pathBiLSTM based networks. TasTas takes the mixed utterance of two speakers andmaps it to two separated utterances, where each utterance contains only onespeaker s voice. Our experiments on the notable benchmark WSJ0-2mix data corpusresult in 20.55dB SDR improvement, 20.35dB SI-SDR improvement, 3.69 of PESQ,and 94.86 of ESTOI, which shows that our proposed networks can lead to bigperformance improvement on the speaker separation task. We have open sourcedour re-implementation of the DPRNN-TasNet here(this https URL),and our TasTas is realized based on this implementation of DPRNN-TasNet, it isbelieved that the results in this paper can be reproduced with ease.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...