eduzhai > Applied Sciences > Engineering >

A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 12 pages

Abstract: Speech synthesis is the artificial production of human speech. A typicaltext-to-speech system converts a language text into a waveform. There existmany English TTS systems that produce mature, natural, and human-like speechsynthesizers. In contrast, other languages, including Arabic, have not beenconsidered until recently. Existing Arabic speech synthesis solutions are slow,of low quality, and the naturalness of synthesized speech is inferior to theEnglish synthesizers. They also lack essential speech key factors such asintonation, stress, and rhythm. Different works were proposed to solve thoseissues, including the use of concatenative methods such as unit selection orparametric methods. However, they required a lot of laborious work and domainexpertise. Another reason for such poor performance of Arabic speechsynthesizers is the lack of speech corpora, unlike English that has manypublicly available corpora and audiobooks. This work describes how to generatehigh quality, natural, and human-like Arabic speech using an end-to-end neuraldeep network architecture. This work uses just $ langle$ text, audio $ rangle$pairs with a relatively small amount of recorded audio samples with a total of2.41 hours. It illustrates how to use English character embedding despite usingdiacritic Arabic characters as input and how to preprocess these audio samplesto achieve the best results.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...