eduzhai > Applied Sciences > Engineering >

wav2vec 20 A Framework for Self-Supervised Learning of Speech Representations

  • Save

... pages left unread,continue reading

Document pages: 19 pages

Abstract: We show for the first time that learning powerful representations from speechaudio alone followed by fine-tuning on transcribed speech can outperform thebest semi-supervised methods while being conceptually simpler. wav2vec 2.0masks the speech input in the latent space and solves a contrastive taskdefined over a quantization of the latent representations which are jointlylearned. Experiments using all labeled data of Librispeech achieve 1.8 3.3 WERon the clean other test sets. When lowering the amount of labeled data to onehour, wav2vec 2.0 outperforms the previous state of the art on the 100 hoursubset while using 100 times less labeled data. Using just ten minutes oflabeled data and pre-training on 53k hours of unlabeled data still achieves4.8 8.2 WER. This demonstrates the feasibility of speech recognition withlimited amounts of labeled data.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...