eduzhai > Applied Sciences > Engineering >

Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: For articulatory-to-acoustic mapping using deep neural networks, typicallyspectral and excitation parameters of vocoders have been used as the trainingtargets. However, vocoding often results in buzzy and muffled final speechquality. Therefore, in this paper on ultrasound-based articulatory-to-acousticconversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on alarge amount of English and Hungarian speech data. The inputs of theconvolutional neural network are ultrasound tongue images. The training targetis the 80-dimensional mel-spectrogram, which results in a finer detailedspectral representation than the previously used 25-dimensional Mel-GeneralizedCepstrum. From the output of the ultrasound-to-mel-spectrogram prediction,WaveGlow inference results in synthesized speech. We compare the proposedWaveGlow-based system with a continuous vocoder which does not use strictvoiced unvoiced decision when predicting F0. The results demonstrate thatduring the articulatory-to-acoustic mapping experiments, the WaveGlow neuralvocoder produces significantly more natural synthesized speech than thebaseline system. Besides, the advantage of WaveGlow is that F0 is included inthe mel-spectrogram representation, and it is not necessary to predict theexcitation separately.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...