eduzhai > Applied Sciences > Engineering >

COALA Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

  • Save

... pages left unread,continue reading

Document pages: 8 pages

Abstract: Audio representation learning based on deep neural networks (DNNs) emerged asan alternative approach to hand-crafted features. For achieving highperformance, DNNs often need a large amount of annotated data which can bedifficult and costly to obtain. In this paper, we propose a method for learningaudio representations, aligning the learned latent representations of audio andassociated tags. Aligning is done by maximizing the agreement of the latentrepresentations of audio and tags, using a contrastive loss. The result is anaudio embedding model which reflects acoustic and semantic characteristics ofsounds. We evaluate the quality of our embedding model, measuring itsperformance as a feature extractor on three different tasks (namely, soundevent recognition, and music genre and musical instrument classification), andinvestigate what type of characteristics the model captures. Our results arepromising, sometimes in par with the state-of-the-art in the considered tasksand the embeddings produced with our method are well correlated with someacoustic descriptors.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...