eduzhai > Applied Sciences > Engineering >

Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

  • king
  • (0) Download
  • 20210507
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: The increased adoption of digital assistants makes text-to-speech (TTS)synthesis systems an indispensable feature of modern mobile devices. It ishence desirable to build a system capable of generating highly intelligiblespeech in the presence of noise. Past studies have investigated styleconversion in TTS synthesis, yet degraded synthesized quality often leads toworse intelligibility. To overcome such limitations, we proposed a noveltransfer learning approach using Tacotron and WaveRNN based TTS synthesis. Theproposed speech system exploits two modification strategies: (a) Lombardspeaking style data and (b) Spectral Shaping and Dynamic Range Compression(SSDRC) which has been shown to provide high intelligibility gains byredistributing the signal energy on the time-frequency domain. We refer to thisextension as Lombard-SSDRC TTS system. Intelligibility enhancement asquantified by the Intelligibility in Bits (SIIB-Gauss) measure shows that theproposed Lombard-SSDRC TTS system shows significant relative improvementbetween 110 and 130 in speech-shaped noise (SSN), and 47 to 140 incompeting-speaker noise (CSN) against the state-of-the-art TTS approach.Additional subjective evaluation shows that Lombard-SSDRC TTS successfullyincreases the speech intelligibility with relative improvement of 455 for SSNand 104 for CSN in median keyword correction rate compared to the baseline TTSmethod.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...