eduzhai > Applied Sciences > Engineering >

LRSpeech Extremely Low-Resource Speech Synthesis and Recognition

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 11 pages

Abstract: Speech synthesis (text to speech, TTS) and recognition (automatic speechrecognition, ASR) are important speech tasks, and require a large amount oftext and speech pairs for model training. However, there are more than 6,000languages in the world and most languages are lack of speech training data,which poses significant challenges when building TTS and ASR systems forextremely low-resource languages. In this paper, we develop LRSpeech, a TTS andASR system under the extremely low-resource setting, which can support rarelanguages with low data cost. LRSpeech consists of three key techniques: 1)pre-training on rich-resource languages and fine-tuning on low-resourcelanguages; 2) dual transformation between TTS and ASR to iteratively boost theaccuracy of each other; 3) knowledge distillation to customize the TTS model ona high-quality target-speaker voice and improve the ASR model on multiplevoices. We conduct experiments on an experimental language (English) and atruly low-resource language (Lithuanian) to verify the effectiveness ofLRSpeech. Experimental results show that LRSpeech 1) achieves high quality forTTS in terms of both intelligibility (more than 98 intelligibility rate) andnaturalness (above 3.5 mean opinion score (MOS)) of the synthesized speech,which satisfy the requirements for industrial deployment, 2) achieves promisingrecognition accuracy for ASR, and 3) last but not least, uses extremelylow-resource training data. We also conduct comprehensive analyses on LRSpeechwith different amounts of data resources, and provide valuable insights andguidances for industrial deployment. We are currently deploying LRSpeech into acommercialized cloud speech service to support TTS on more rare languages.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×