eduzhai > Applied Sciences > Engineering >

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Tacotron-based end-to-end speech synthesis has shown remarkable voicequality. However, the rendering of prosody in the synthesized speech remains tobe improved, especially for long sentences, where prosodic phrasing errors canoccur frequently. In this paper, we extend the Tacotron-based speech synthesisframework to explicitly model the prosodic phrase breaks. We propose amulti-task learning scheme for Tacotron training, that optimizes the system topredict both Mel spectrum and phrase breaks. To our best knowledge, this is thefirst implementation of multi-task learning for Tacotron based TTS with aprosodic phrasing model. Experiments show that our proposed training schemeconsistently improves the voice quality for both Chinese and Mongolian systems.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...