eduzhai > Applied Sciences > Engineering >

SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Recent advancement in Generative Adversarial Networks in speech synthesisdomain[3],[2] have shown, that it s possible to train GANs [8] in a reliablemanner for high quality coherent waveform generation from mel-spectograms. Wepropose that it is possible to transfer the MelGAN s [3] robustness in learningspeech features to speech enhancement and noise reduction domain without anymodel modification tasks. Our proposed method generalizes over multi-speakerspeech dataset and is able to robustly handle unseen background noises duringthe inference. Also, we show that by increasing the batch size for thisparticular approach not only yields better speech results, but generalizes overmulti-speaker dataset easily and leads to faster convergence. Additionally, itoutperforms previous state of the art GAN approach for speech enhancement SEGAN[5] in two domains: 1. quality ; 2. speed. Proposed method runs at more than100x faster than realtime on GPU and more than 2x faster than real time on CPUwithout any hardware optimization tasks, right at the speed of MelGAN [3].

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...