eduzhai > Applied Sciences > Engineering >

Evaluating the reliability of acoustic speech embeddings

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Speech embeddings are fixed-size acoustic representations of variable-lengthspeech sequences. They are increasingly used for a variety of tasks rangingfrom information retrieval to unsupervised term discovery and speechsegmentation. However, there is currently no clear methodology to compare oroptimise the quality of these embeddings in a task-neutral way. Here, wesystematically compare two popular metrics, ABX discrimination and Mean AveragePrecision (MAP), on 5 languages across 17 embedding methods, ranging fromsupervised to fully unsupervised, and using different loss functions(autoencoders, correspondence autoencoders, siamese). Then we use the ABX andMAP to predict performances on a new downstream task: the unsupervisedestimation of the frequencies of speech segments in a given corpus. We findthat overall, ABX and MAP correlate with one another and with frequencyestimation. However, substantial discrepancies appear in the fine-graineddistinctions across languages and or embedding methods. This makes itunrealistic at present to propose a task-independent silver bullet method forcomputing the intrinsic quality of speech embeddings. There is a need for moredetailed analysis of the metrics currently used to evaluate such embeddings.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...