eduzhai > Applied Sciences > Engineering >

Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Modern approaches to text to speech require the entire input charactersequence to be processed before any audio is synthesised. This latency limitsthe suitability of such models for time-sensitive tasks like simultaneousinterpretation. Interleaving the action of reading a character with that ofsynthesising audio reduces this latency. However, the order of this sequence ofinterleaved actions varies across sentences, which raises the question of howthe actions should be chosen. We propose a reinforcement learning basedframework to train an agent to make this decision. We compare our performanceagainst that of deterministic, rule-based systems. Our results demonstrate thatour agent successfully balances the trade-off between the latency of audiogeneration and the quality of synthesised audio. More broadly, we show thatneural sequence-to-sequence models can be adapted to run in an incrementalmanner.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...