eduzhai > Applied Sciences > Engineering >

Unsupervised Cross-lingual Representation Learning for Speech Recognition

  • Save

... pages left unread,continue reading

Document pages: 12 pages

Abstract: This paper presents XLSR which learns cross-lingual speech representations bypretraining a single model from the raw waveform of speech in multiplelanguages. We build on wav2vec 2.0 which is trained by solving a contrastivetask over masked latent speech representations and jointly learns aquantization of the latents shared across languages. The resulting model isfine-tuned on labeled data and experiments show that cross-lingual pretrainingsignificantly outperforms monolingual pretraining. On the CommonVoicebenchmark, XLSR shows a relative phoneme error rate reduction of 72 comparedto the best known results. On BABEL, our approach improves word error rate by16 relative compared to a comparable system. Our approach enables a singlemultilingual speech recognition model which is competitive to strong individualmodels. Analysis shows that the latent discrete speech representations areshared across languages with increased sharing for related languages. We hopeto catalyze research in low-resource speech understanding by releasing XLSR-53,a large model pretrained in 53 languages.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...