eduzhai > Applied Sciences > Engineering >

Look and Listen A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 6 pages

Abstract: The novelty of this study consists in a multi-modality approach to sceneclassification, where image and audio complement each other in a process ofdeep late fusion. The approach is demonstrated on a difficult classificationproblem, consisting of two synchronised and balanced datasets of 16,000 dataobjects, encompassing 4.4 hours of video of 8 environments with varying degreesof similarity. We first extract video frames and accompanying audio at onesecond intervals. The image and the audio datasets are first classifiedindependently, using a fine-tuned VGG16 and an evolutionary optimised deepneural network, with accuracies of 89.27 and 93.72 , respectively. This isfollowed by late fusion of the two neural networks to enable a higher orderfunction, leading to accuracy of 96.81 in this multi-modality classifier withsynchronised video frames and audio clips. The tertiary neural networkimplemented for late fusion outperforms classical state-of-the-art classifiersby around 3 when the two primary networks are considered as featuregenerators. We show that situations where a single-modality may be confused byanomalous data points are now corrected through an emerging higher orderintegration. Prominent examples include a water feature in a city misclassifiedas a river by the audio classifier alone and a densely crowded streetmisclassified as a forest by the image classifier alone. Both are exampleswhich are correctly classified by our multi-modality approach.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...