eduzhai > Applied Sciences > Engineering >

VocGAN A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: We present a novel high-fidelity real-time neural vocoder called VocGAN. Arecently developed GAN-based vocoder, MelGAN, produces speech waveforms inreal-time. However, it often produces a waveform that is insufficient inquality or inconsistent with acoustic characteristics of the input melspectrogram. VocGAN is nearly as fast as MelGAN, but it significantly improvesthe quality and consistency of the output waveform. VocGAN applies amulti-scale waveform generator and a hierarchically-nested discriminator tolearn multiple levels of acoustic properties in a balanced way. It also appliesthe joint conditional and unconditional objective, which has shown successfulresults in high-resolution image synthesis. In experiments, VocGAN synthesizesspeech waveforms 416.7x faster on a GTX 1080Ti GPU and 3.24x faster on a CPUthan real-time. Compared with MelGAN, it also exhibits significantly improvedquality in multiple evaluation metrics including mean opinion score (MOS) withminimal additional overhead. Additionally, compared with Parallel WaveGAN,another recently developed high-fidelity vocoder, VocGAN is 6.98x faster on aCPU and exhibits higher MOS.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...