eduzhai > Applied Sciences > Engineering >

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: We introduce a novel method for emotion conversion in speech that does notrequire parallel training data. Our approach loosely relies on a cycle-GANschema to minimize the reconstruction error from converting back and forthbetween emotion pairs. However, unlike the conventional cycle-GAN, ourdiscriminator classifies whether a pair of input real and generated samplescorresponds to the desired emotion conversion (e.g., A to B) or to its inverse(B to A). We will show that this setup, which we refer to as a variationalcycle-GAN (VC-GAN), is equivalent to minimizing the empirical KL divergencebetween the source features and their cyclic counterpart. In addition, ourgenerator combines a trainable deep network with a fixed generative block toimplement a smooth and invertible transformation on the input features, in ourcase, the fundamental frequency (F0) contour. This hybrid architectureregularizes our adversarial training procedure. We use crowd sourcing toevaluate both the emotional saliency and the quality of synthesized speech.Finally, we show that our model generalizes to new speakers by modifying speechproduced by Wavenet.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...