eduzhai > Applied Sciences > Engineering >

BatVision with GCC-PHAT Features for Better Sound to Vision Predictions

  • Save

... pages left unread,continue reading

Document pages: 4 pages

Abstract: Inspired by sophisticated echolocation abilities found in nature, we train agenerative adversarial network to predict plausible depth maps and grayscalelayouts from sound. To achieve this, our sound-to-vision model processesbinaural echo-returns from chirping sounds. We build upon previous work withBatVision that consists of a sound-to-vision model and a self-collected datasetusing our mobile robot and low-cost hardware. We improve on the previous modelby introducing several changes to the model, which leads to a better depth andgrayscale estimation, and increased perceptual quality. Rather than using rawbinaural waveforms as input, we generate generalized cross-correlation (GCC)features and use these as input instead. In addition, we change the modelgenerator and base it on residual learning and use spectral normalization inthe discriminator. We compare and present both quantitative and qualitativeimprovements over our previous BatVision model.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×