eduzhai > Applied Sciences > Engineering >

Audio Captioning using Gated Recurrent Units

  • Save

... pages left unread,continue reading

Document pages: 6 pages

Abstract: Audio captioning is a recently proposed task for automatically generating atextual description of a given audio clip. In this study, a novel deep networkarchitecture with audio embeddings is presented to predict audio captions.Within the aim of extracting audio features in addition to log Mel energies,VGGish audio embedding model is used to explore the usability of audioembeddings in the audio captioning task. The proposed architecture encodesaudio and text input modalities separately and combines them before thedecoding stage. Audio encoding is conducted through Bi-directional GatedRecurrent Unit (BiGRU) while GRU is used for the text encoding phase. Followingthis, we evaluate our model by means of the newly published audio captioningperformance dataset, namely Clotho, to compare the experimental results withthe literature. Our experimental results show that the proposed BiGRU-baseddeep model outperforms the state of the art results.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...