eduzhai > Applied Sciences > Engineering >

Perceptron Synthesis Network Rethinking the Action Scale Variances in Videos

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 16 pages

Abstract: Video action recognition has been partially addressed by the CNNs stacking offixed-size 3D kernels. However, these methods may under-perform for onlycapturing rigid spatial-temporal patterns in single-scale spaces, whileneglecting the scale variances across different action primitives. To overcomethis limitation, we propose to learn the optimal-scale kernels from the data.More specifically, an textit{action perceptron synthesizer} is proposed togenerate the kernels from a bag of fixed-size kernels that are interacted bydense routing paths. To guarantee the interaction richness and the informationcapacity of the paths, we design the novel textit{optimized feature fusionlayer}. This layer establishes a principled universal paradigm that suffices tocover most of the current feature fusion techniques (e.g., channel shuffling,and channel dropout) for the first time. By inserting the textit{synthesizer},our method can easily adapt the traditional 2D CNNs to the video understandingtasks such as action recognition with marginal additional computation cost. Theproposed method is thoroughly evaluated over several challenging datasets(i.e., Somehting-to-Somthing, Kinetics and Diving48) that highly requiretemporal reasoning or appearance discriminating, achieving new state-of-the-artresults. Particularly, our low-resolution model outperforms the recent strongbaseline methods, i.e., TSM and GST, with less than 30 of their computationcost.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...