eduzhai > Applied Sciences > Engineering >

An ASR Guided Speech Intelligibility Measure for TTS Model Selection

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: The perceptual quality of neural text-to-speech (TTS) is highly dependent onthe choice of the model during training. Selecting the model using atraining-objective metric such as the least mean squared error does not alwayscorrelate with human perception. In this paper, we propose an objective metricbased on the phone error rate (PER) to select the TTS model with the bestspeech intelligibility. The PER is computed between the input text to the TTSmodel, and the text decoded from the synthesized speech using an automaticspeech recognition (ASR) model, which is trained on the same data as the TTSmodel. With the help of subjective studies, we show that the TTS model chosenwith the least PER on validation split has significantly higher speechintelligibility compared to the model with the least training-objective metricloss. Finally, using the proposed PER and subjective evaluation, we show thatthe choice of best TTS model depends on the genre of the target domain text.All our experiments are conducted on a Hindi language dataset. However, theproposed model selection method is language independent.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...