eduzhai > Applied Sciences > Engineering >

Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: We propose a novel method for emotion conversion in speech based on a chainedencoder-decoder-predictor neural network architecture. The encoder constructs alatent embedding of the fundamental frequency (F0) contour and the spectrum,which we regularize using the Large Diffeomorphic Metric Mapping (LDDMM)registration framework. The decoder uses this embedding to predict the modifiedF0 contour in a target emotional class. Finally, the predictor uses theoriginal spectrum and the modified F0 contour to generate a correspondingtarget spectrum. Our joint objective function simultaneously optimizes theparameters of three model blocks. We show that our method outperforms theexisting state-of-the-art approaches on both, the saliency of emotionconversion and the quality of resynthesized speech. In addition, the LDDMMregularization allows our model to convert phrases that were not present intraining, thus providing evidence for out-of-sample generalization.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...