eduzhai > Applied Sciences > Engineering >

Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 7 pages

Abstract: Detecting sound source objects within visual observation is important forautonomous robots to comprehend surrounding environments. Since soundingobjects have a large variety with different appearances in our livingenvironments, labeling all sounding objects is impossible in practice. Thiscalls for self-supervised learning which does not require manual labeling. Mostof conventional self-supervised learning uses monaural audio signals and imagesand cannot distinguish sound source objects having similar appearances due topoor spatial information in audio signals. To solve this problem, this paperpresents a self-supervised training method using 360° images andmultichannel audio signals. By incorporating with the spatial information inmultichannel audio signals, our method trains deep neural networks (DNNs) todistinguish multiple sound source objects. Our system for localizing soundsource objects in the image is composed of audio and visual DNNs. The visualDNN is trained to localize sound source candidates within an input image. Theaudio DNN verifies whether each candidate actually produces sound or not. TheseDNNs are jointly trained in a self-supervised manner based on a probabilisticspatial audio model. Experimental results with simulated data showed that theDNNs trained by our method localized multiple speakers. We also demonstratethat the visual DNN detected objects including talking visitors and specificexhibits from real data recorded in a science museum.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...