eduzhai > Applied Sciences > Engineering >

Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Data efficient voice cloning aims at synthesizing target speaker s voice withonly a few enrollment samples at hand. To this end, speaker adaptation andspeaker encoding are two typical methods based on base model trained frommultiple speakers. The former uses a small set of target speaker data totransfer the multi-speaker model to target speaker s voice through direct modelupdate, while in the latter, only a few seconds of target speaker s audiodirectly goes through an extra speaker encoding model along with themulti-speaker model to synthesize target speaker s voice without model update.Nevertheless, the two methods need clean target speaker data. However, thesamples provided by user may inevitably contain acoustic noise in realapplications. It s still challenging to generating target voice with noisydata. In this paper, we study the data efficient voice cloning problem fromnoisy samples under the sequence-to-sequence based TTS paradigm. Specifically,we introduce domain adversarial training (DAT) to speaker adaptation andspeaker encoding, which aims to disentangle noise from speech-noise mixture.Experiments show that for both speaker adaptation and encoding, the proposedapproaches can consistently synthesize clean speech from noisy speaker samples,apparently outperforming the method adopting state-of-the-art speechenhancement module.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...