eduzhai > Applied Sciences > Engineering >

SpeedySpeech Efficient Neural Speech Synthesis

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: While recent neural sequence-to-sequence models have greatly improved thequality of speech synthesis, there has not been a system capable of fasttraining, fast inference and high-quality audio synthesis at the same time. Wepropose a student-teacher network capable of high-quality faster-than-real-timespectrogram synthesis, with low requirements on computational resources andfast training time. We show that self-attention layers are not necessary forgeneration of high quality audio. We utilize simple convolutional blocks withresidual connections in both student and teacher networks and use only a singleattention layer in the teacher model. Coupled with a MelGAN vocoder, ourmodel s voice quality was rated significantly higher than Tacotron 2. Our modelcan be efficiently trained on a single GPU and can run in real time even on aCPU. We provide both our source code and audio samples in our GitHubrepository.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...