eduzhai > Applied Sciences > Engineering >

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 7 pages

Abstract: Cross-lingual voice conversion aims to change source speaker s voice to soundlike that of target speaker, when source and target speakers speak differentlanguages. It relies on non-parallel training data from two differentlanguages, hence, is more challenging than mono-lingual voice conversion.Previous studies on cross-lingual voice conversion mainly focus on spectralconversion with a linear transformation for F0 transfer. However, as animportant prosodic factor, F0 is inherently hierarchical, thus it isinsufficient to just use a linear method for conversion. We propose the use ofcontinuous wavelet transform (CWT) decomposition for F0 modeling. CWT providesa way to decompose a signal into different temporal scales that explain prosodyin different time resolutions. We also propose to train two CycleGAN pipelinesfor spectrum and prosody mapping respectively. In this way, we eliminate theneed for parallel data of any two languages and any alignment techniques.Experimental results show that our proposed Spectrum-Prosody-CycleGAN frameworkoutperforms the Spectrum-CycleGAN baseline in subjective evaluation. To ourbest knowledge, this is the first study of prosody in cross-lingual voiceconversion.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...