eduzhai > Applied Sciences > Engineering >

CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Recently, Generative Adversarial Networks (GAN)-based methods have shownremarkable performance for the Voice Conversion and WHiSPer-to-normal SPeeCH(WHSP2SPCH) conversion. One of the key challenges in WHSP2SPCH conversion isthe prediction of fundamental frequency (F0). Recently, authors have proposedstate-of-the-art method Cycle-Consistent Generative Adversarial Networks(CycleGAN) for WHSP2SPCH conversion. The CycleGAN-based method uses twodifferent models, one for Mel Cepstral Coefficients (MCC) mapping, and anotherfor F0 prediction, where F0 is highly dependent on the pre-trained model of MCCmapping. This leads to additional non-linear noise in predicted F0. To suppressthis noise, we propose Cycle-in-Cycle GAN (i.e., CinC-GAN). It is speciallydesigned to increase the effectiveness in F0 prediction without losing theaccuracy of MCC mapping. We evaluated the proposed method on a non-parallelsetting and analyzed on speaker-specific, and gender-specific tasks. Theobjective and subjective tests show that CinC-GAN significantly outperforms theCycleGAN. In addition, we analyze the CycleGAN and CinC-GAN for unseen speakersand the results show the clear superiority of CinC-GAN.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×