eduzhai > Applied Sciences > Engineering >

PoCoNet Better Speech Enhancement with Frequency-Positional Embeddings Semi-Supervised Conversational Data and Biased Loss

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Neural network applications generally benefit from larger-sized models, butfor current speech enhancement models, larger scale networks often suffer fromdecreased robustness to the variety of real-world use cases beyond what isencountered in training data. We introduce several innovations that lead tobetter large neural networks for speech enhancement. The novel PoCoNetarchitecture is a convolutional neural network that, with the use offrequency-positional embeddings, is able to more efficiently buildfrequency-dependent features in the early layers. A semi-supervised methodhelps increase the amount of conversational training data by pre-enhancingnoisy datasets, improving performance on real recordings. A new loss functionbiased towards preserving speech quality helps the optimization better matchhuman perceptual opinions on speech quality. Ablation experiments and objectiveand human opinion metrics show the benefits of the proposed improvements.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...