eduzhai > Applied Sciences > Engineering >

Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Recently, sequence-to-sequence models with attention have been successfullyapplied in Text-to-speech (TTS). These models can generate near-human speechwith a large accurately-transcribed speech corpus. However, preparing such alarge data-set is both expensive and laborious. To alleviate the problem ofheavy data demand, we propose a novel unsupervised pre-training mechanism inthis paper. Specifically, we first use Vector-quantizationVariational-Autoencoder (VQ-VAE) to ex-tract the unsupervised linguistic unitsfrom large-scale, publicly found, and untranscribed speech. We then pre-trainthe sequence-to-sequence TTS model by using the<unsupervised linguistic units,audio>pairs. Finally, we fine-tune the model with a small amount of<text,audio>paired data from the target speaker. As a result, both objective andsubjective evaluations show that our proposed method can synthesize moreintelligible and natural speech with the same amount of paired training data.Besides, we extend our proposed method to the hypothesized low-resourcelanguages and verify the effectiveness of the method using objectiveevaluation.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...