eduzhai > Applied Sciences > Engineering >

On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 6 pages

Abstract: The advent of learning-based methods in speech enhancement has revived theneed for robust and reliable training features that can compactly representspeech signals while preserving their vital information. Time-frequency domainfeatures, such as the Short-Term Fourier Transform (STFT) and Mel-FrequencyCepstral Coefficients (MFCC), are preferred in many approaches. While the MFCCprovide for a compact representation, they ignore the dynamics and distributionof energy in each mel-scale subband. In this work, a speech enhancement systembased on Generative Adversarial Network (GAN) is implemented and tested with acombination of Audio FingerPrinting (AFP) features obtained from the MFCC andthe Normalized Spectral Subband Centroids (NSSC). The NSSC capture thelocations of speech formants and complement the MFCC in a crucial way. Inexperiments with diverse speakers and noise types, GAN-based speech enhancementwith the proposed AFP feature combination achieves the best objectiveperformance while reducing memory requirements and training time.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...