eduzhai > Applied Sciences > Engineering >

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: This paper proposes a controllable end-to-end text-to-speech (TTS) system tocontrol the speaking speed (speed-controllable TTS; SCTTS) of synthesizedspeech with sentence-level speaking-rate value as an additional input. Thespeaking-rate value, the ratio of the number of input phonemes to the length ofinput speech, is adopted in the proposed system to control the speaking speed.Furthermore, the proposed SCTTS system can control the speaking speed whileretaining other speech attributes, such as the pitch, by adopting the globalstyle token-based style encoder. The proposed SCTTS does not require anyadditional well-trained model or an external speech database to extractphoneme-level duration information and can be trained in an end-to-end manner.In addition, our listening tests on fast-, normal-, and slow-speed speechshowed that the SCTTS can generate more natural speech than other phonemeduration control approaches which increase or decrease duration at the samerate for the entire sentence, especially in the case of slow-speed speech.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...