eduzhai > Applied Sciences > Engineering >

Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: In this study, we propose an encoder-decoder structured system with fullyconvolutional networks to implement voice activity detection (VAD) directly onthe time-domain waveform. The proposed system processes the input waveform toidentify its segments to be either speech or non-speech. This novelwaveform-based VAD algorithm, with a short-hand notation "WVAD ", has two mainparticularities. First, as compared to most conventional VAD systems that usespectral features, raw-waveforms employed in WVAD contain more comprehensiveinformation and thus are supposed to facilitate more accurate speech non-speechpredictions. Second, based on the multi-branched architecture, WVAD can beextended by using an ensemble of encoders, referred to as WEVAD, thatincorporate multiple attribute information in utterances, and thus can yieldbetter VAD performance for specified acoustic conditions. We evaluated thepresented WVAD and WEVAD for the VAD task in two datasets: First, theexperiments conducted on AURORA2 reveal that WVAD outperforms manystate-of-the-art VAD algorithms. Next, the TMHINT task confirms that throughcombining multiple attributes in utterances, WEVAD behaves even better thanWVAD.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×