eduzhai > Applied Sciences > Engineering >

Controllable Neural Prosody Synthesis

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Speech synthesis has recently seen significant improvements in fidelity,driven by the advent of neural vocoders and neural prosody generators. However,these systems lack intuitive user controls over prosody, making them unable torectify prosody errors (e.g., misplaced emphases and contextually inappropriateemotions) or generate prosodies with diverse speaker excitement levels andemotions. We address these limitations with a user-controllable, context-awareneural prosody generator. Given a real or synthesized speech recording, ourmodel allows a user to input prosody constraints for certain time frames andgenerates the remaining time frames from input text and contextual prosody. Wealso propose a pitch-shifting neural vocoder to modify input speech to matchthe synthesized prosody. Through objective and subjective evaluations we showthat we can successfully incorporate user control into our prosody generationmodel without sacrificing the overall naturalness of the synthesized speech.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...