eduzhai > Applied Sciences > Engineering >

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

  • Save

... pages left unread,continue reading

Document pages: 11 pages

Abstract: Acoustic word embeddings are fixed-dimensional representations ofvariable-length speech segments. Such embeddings can form the basis for speechsearch, indexing and discovery systems when conventional speech recognition isnot possible. In zero-resource settings where unlabelled speech is the onlyavailable resource, we need a method that gives robust embeddings on anarbitrary language. Here we explore multilingual transfer: we train a singlesupervised embedding model on labelled data from multiple well-resourcedlanguages and then apply it to unseen zero-resource languages. We considerthree multilingual recurrent neural network (RNN) models: a classifier trainedon the joint vocabularies of all training languages; a Siamese RNN trained todiscriminate between same and different words from multiple languages; and acorrespondence autoencoder (CAE) RNN trained to reconstruct word pairs. In aword discrimination task on six target languages, all of these modelsoutperform state-of-the-art unsupervised models trained on the zero-resourcelanguages themselves, giving relative improvements of more than 30 in averageprecision. When using only a few training languages, the multilingual CAEperforms better, but with more training languages the other multilingual modelsperform similarly. Using more training languages is generally beneficial, butimprovements are marginal on some languages. We present probing experimentswhich show that the CAE encodes more phonetic, word duration, language identityand speaker information than the other multilingual models.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...