eduzhai > Applied Sciences > Engineering >

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Recent neural speech synthesis systems have gradually focused on the controlof prosody to improve the quality of synthesized speech, but they rarelyconsider the variability of prosody and the correlation between prosody andsemantics together. In this paper, a prosody learning mechanism is proposed tomodel the prosody of speech based on TTS system, where the prosody informationof speech is extracted from the melspectrum by a prosody learner and combinedwith the phoneme sequence to reconstruct the mel-spectrum. Meanwhile, thesematic features of text from the pre-trained language model is introduced toimprove the prosody prediction results. In addition, a novel self-attentionstructure, named as local attention, is proposed to lift this restriction ofinput text length, where the relative position information of the sequence ismodeled by the relative position matrices so that the position encodings is nolonger needed. Experiments on English and Mandarin show that speech with moresatisfactory prosody has obtained in our model. Especially in Mandarinsynthesis, our proposed model outperforms baseline model with a MOS gap of0.08, and the overall naturalness of the synthesized speech has beensignificantly improved.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...