eduzhai > Applied Sciences > Engineering >

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: This paper proposes a network architecture mainly designed for audio tagging,which can also be used for weakly supervised acoustic event detection (AED).The proposed network consists of a modified DenseNet as the feature extractor,and a global average pooling (GAP) layer to predict frame-level labels atinference time. This architecture is inspired by the work proposed by Zhou etal., a well-known framework using GAP to localize visual objects givenimage-level labels. While most of the previous works on weakly supervised AEDused recurrent layers with attention-based mechanism to localize acousticevents, the proposed network directly localizes events using the feature mapextracted by DenseNet without any recurrent layers. In the audio tagging taskof DCASE 2017, our method significantly outperforms the state-of-the-art methodin F1 score by 5.3 on the dev set, and 6.0 on the eval set in terms ofabsolute values. For weakly supervised AED task in DCASE 2018, our modeloutperforms the state-of-the-art method in event-based F1 by 8.1 on the devset, and 0.5 on the eval set in terms of absolute values, by using dataaugmentation and tri-training to leverage unlabeled data.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...